You should take care about statistics!

by oracleninja 18. April 2020 Uncategorized 0

Ever since Oracle introduced the CBO (Cost Based Optimizer) there were some performance issues, mostly because of missing or staled statistics in the database.

I stopped to count the cases, when I just gathered fresh statistics for some table(s) and voilà – the query runs 100x faster!

I will mention just one of the latest cutomer’s issues: a relatively small schema (under 1 GB) gets exported every day from one database – without statistics! – and then imported into another database. As soon as the import job gets done, end users start to query the data. And they claim then:”Uh oh, why is the database so slow!?“

Well, you know …

After my recommendation to gather the fresh statistics for the whole schema after import, the issue has been gone.

But, let me ask you something – do you care manually & proactively about your statistics? Or do you simply let the automatic database job to gather all statistics as needed?

Thinking about 3rd party applications which I faced in the last 15 years, there was only 1 application with its own “statistics concept” to proactively take care about ALL statistics for the involved objects! The others have always blamed the database (or us, the DBAs).

Therefore – my opinion & recommendation is to create your own “statistics concept” to proactively take care about all objects (tables & indexes) used within your applications.

What does it mean? It means, you shouldn’t blindly rely on the automatic task ‘auto optimizer stats collection‘, at least for two reasons:

the job takes sometimes too long and cannot finish the statistics gathering on all involved objects (tables, partitions, indexes) within associated window (you can read more about it here – written by Nigel Bayliss, Product manager for the Oracle Optimizer: https://blogs.oracle.com/optimizer/how-to-gather-optimizer-statistics-fast )
the job can gather the statistics on all columns, creating also mostly unneeded/undesired histograms on them (please, take a look at this chart made by Jonathan Lewis: https://jonathanlewis.wordpress.com/2018/01/23/histograms-2/ )

Instead of that, for every application you run, you should create some sort of “statistics concept”, to have complete control of the statistics.

Here are some tips:

If you do not review & make use of the results of the other two auto tasks (segment space advisor & sql tuning advisor), then you should disable them:

BEGIN
  DBMS_AUTO_TASK_ADMIN.disable(
    client_name => 'auto space advisor',
    operation   => NULL,
    window_name => NULL);

  DBMS_AUTO_TASK_ADMIN.disable(
    client_name => 'sql tuning advisor',
    operation   => NULL,
    window_name => NULL);
END;
/

If you want stable & predictable execution plans, you should lock the statistics, and gather a fresh ones using the parameter FORCE, which allows you to override any lock on statistics.

DBMS_STATS.LOCK_TABLE_STATS
DBMS_STATS.LOCK_PARTITION_STATS
DBMS_STATS.GATHER_TABLE_STATS (
...
force => TRUE);

Gather statistics Concurently & in Parallel, but with care, because you can generate a very high system load.
https://blogs.oracle.com/optimizer/how-to-gather-optimizer-statistics-fast

You should make tests with pending statistics.
https://db-blog.web.cern.ch/blog/franck-pachot/2018-09-optimizer-statistics-gathering-pending-and-history
You should specify only the columns, which are used as predicates in JOIN or WHERE clause, because you don’t need to gather the statistics for dozens of columns which contain some fact data.
Or in some cases you can specify “Method_Opt => ‘FOR ALL INDEXED COLUMNS SIZE AUTO ‘” to gather the statistics only for indexed columns.

BEGIN
  dbms_stats.gather_table_stats(
    ownname => 'APP_SCHEMA',
    tabname => 'APP_TABLE',
    estimate_percent => 100,
    granularity => 'ALL',
    method_opt => 'for columns size 1 ORDER_DATE, CUSTOMER_ID, ORDER_ID',
    cascade => TRUE
  );
END;
/

If you are doing a bulk data load into DWH table with partitions, that will make the partition statistics stale and may also make the global statistics stale. Re-gathering statistics for the effected partitions and for the entire table is often very time consuming.

To avoid it, you can use INCREMENTAL statistics gathering.

INCREMENTAL determines whether the global statistics for a partitioned table are maintained without performing a full table scan.

When a table is partitioned, an application typically loads data into a new partition.
As new partitions are added and data is loaded, global table statistics must be kept up to date.
If the following conditions are met, then the database updates the global table statistics by scanning only the changed partitions instead of the entire table:
- The INCREMENTAL value for the partitioned table is set to TRUE.
- The PUBLISH value for the partitioned table is set to TRUE.
- The user specifies AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT and AUTO for GRANULARITY when gathering statistics on the table.
  
  If the INCREMENTAL value for the partitioned table was set to FALSE (default value), then the database uses a full table scan to maintain the global statistics.

  dbms_stats.set_table_prefs('APP_SCHEMA','APP_PART_TB','INCREMENTAL','TRUE');
dbms_stats.set_table_prefs('APP_SCHEMA','APP_PART_TB','PUBLISH','TRUE');

Change some table preferences. Do not rely on the default settings.

Especially you should take care about the new type of creating synopses (APPROXIMATE_NDV_ALGORITHM), if you upgraded some pre 12.2 database to the newer version.
https://blogs.oracle.com/optimizer/efficient-statistics-maintenance-for-partitioned-tables-using-incremental-statistics-part-3

TABLE_CACHED_BLOCKS can improve your clustering factor.
https://richardfoote.wordpress.com/category/table_cached_blocks/
https://jonathanlewis.wordpress.com/2018/07/02/clustering_factor-5/

DBMS_STATS.SET_TABLE_PREFS('APP_SCHEMA','APP_PART_T','APPROXIMATE_NDV_ALGORITHM','HYPERLOGLOG');
DBMS_STATS.SET_TABLE_PREFS('APP_SCHEMA','APP_PART_T','INCREMENTAL_STALENESS','NULL');
DBMS_STATS.SET_TABLE_PREFS('APP_SCHEMA','APP_PART_T','INCREMENTAL_STALENESS','USE_STALE_PERCENT, USE_LOCKED_STATS');
DBMS_STATS.SET_TABLE_PREFS('APP_SCHEMA','APP_PART_T','TABLE_CACHED_BLOCKS', 16);
DBMS_STATS.SET_TABLE_PREFS('APP_SCHEMA','APP_PART_T','STALE_PERCENT', 0.1);
DBMS_STATS.SET_TABLE_PREFS('APP_SCHEMA','APP_PART_T','DEGREE', DBMS_STATS.AUTO_DEGREE)

You can also sometimes just copy the statistics from some existing partition(s).

DBMS_STATS.COPY_TABLE_STATS

If you have some correlated data, then you can also gather extended statistics for them.
https://blogs.oracle.com/optimizer/how-do-i-know-what-extended-statistics-are-needed-for-a-given-workload
https://blogs.oracle.com/optimizer/extended-statistics
If you also want to have better statistics for your workload, and if you create a lot of objects (tables, partitions, subpartitions, indexes, etc.), then you should also gather fixed object statistics and system statistics.

DBMS_STATS.GATHER_SYSTEM_STATISTICS
DBMS_STATS GATHER_FIXED_OBJECTS_STATS

I will stop here, because you should have now more than enough details to start making your own statistics concept.

Here are some useful articles about this topic (random ordered), which I also used as source for this blog post:

Leave a Reply Cancel reply