The Power of MS SQL Statistics: A Comprehensive Guide

In MS SQL Statistics are collections of information about the distribution of data within tables or indexed views. These statistics are used by the Query Optimizer to make informed decisions about how to execute queries efficiently.

ms sql statistics

The Query Optimizer uses statistics to create query plans that improve query performance. For most queries, the Query Optimizer already generates the necessary statistics for a high-quality query plan; in some cases, you need to create additional statistics or modify the query design for the best results.

The Query Optimizer’s role is to find the cheapest, not the best, cost-effective execution plan for the query.

What is SQL Statistics?

Statistics refer to metadata about the distribution of values in one or more columns of a table or indexed view. These statistics are used by the SQL Server query optimizer to make decisions about how to retrieve or manipulate data efficiently.

When a query is submitted to SQL Server, the query optimizer analyzes the available statistics to generate an execution plan that determines how the query should be executed. Statistics help the optimizer estimate the number of rows that will be returned by a query and make decisions about the most efficient way to access the data, such as choosing the appropriate indexes or join algorithms.

Key Functions of MS SQL Statistics:

  1. Optimizing Query Execution:Statistics empower the query optimizer to create efficient execution plans. By analyzing the distribution of values, SQL Server can estimate the number of rows returned by a query, facilitating decisions on the most effective data access methods.
  2. Automatic Statistic Generation:SQL Server automatically generates and updates statistics for columns involved in query predicates or join conditions. This ensures that the optimizer has the most up-to-date information to make accurate decisions.
  3. Manual Statistics Management:While automatic updates are generally sufficient, there are scenarios where manual intervention is beneficial. The CREATE STATISTICS and UPDATE STATISTICS statements allow users to take control of the statistics creation process, offering customization based on specific requirements.
  4. Crucial Role in Query Tuning:Understanding how the query optimizer uses statistics is fundamental for database administrators and developers. It empowers them to fine-tune queries, optimize indexing, and enhance overall database performance.

How to create statistics in SQL?

SQL Server automatically creates and updates statistics for columns that are involved in query predicates or join conditions. However, there are situations where you might need to manage statistics manually. For example, you can create or update statistics using the CREATE STATISTICS or UPDATE STATISTICS statements. This can be useful in scenarios where automatic statistics updates might not be sufficient or when you want to customize the statistics creation process.

How Statistics Works in SQL Server?

Imagine a vast library holding countless books. When you need a specific book, searching through every single one would be painstakingly slow. This is where a well-organized catalog comes in – it provides an overview of the books, allowing you to locate the desired one quickly.

Similarly, MS SQL statistics act as a catalog for your database. By analyzing the summarized data, the query optimizer can:

  • Estimate the number of rows a query will return for each step in the execution plan.
  • Choose the most efficient join order by understanding the relationship between tables.
  • Determine the best index to use for filtering data, if any exist.
  • Predict the cost (resources like CPU, memory) of executing each plan.

Based on these estimations, the optimizer selects the plan with the lowest estimated cost, effectively streamlining the query execution process. (MS SQL Statistics)

How Up-to-Date Statistics Enhance Performance?

Imagine the library catalog being outdated, with inaccurate information about the book locations. Searching for a book would become frustrating and time-consuming. The same principle applies to MS SQL statistics. Outdated statistics can lead to:

  • Suboptimal query plans: The optimizer might choose inefficient plans based on inaccurate data.
  • Slow query execution times: Queries might take longer to run due to poor plan selection.
  • Increased resource consumption: Inefficient plans can consume more CPU, memory, and I/O resources.

Components of the Statistics:

There are multiple components of the statistics, we will cover main components here:

  • Histogram
  • Density vector

1) Histogram

histogram measures the frequency of occurrence for each distinct value in a data set.

The Query Optimizer computes a histogram on the column values in the first key column of the statistics object, selecting the column values by statistically sampling the rows or by performing a full scan of all rows in the table or view.

2) Density vector

Density is information about the number of duplicates in a given column or combination of columns and it is calculated as 1/(number of distinct values).

The Query Optimizer uses densities to enhance cardinality estimates for queries that return multiple columns from the same table or indexed view. As density decreases, selectivity of a value increases.

For example, in a table representing mobiles, many mobiles have the same manufacturer, but each mobile has a unique mobile Equipment Identification number (IMEI). An index on the IMEI is more selective than an index on the manufacturer because IMEI has a lower density than the manufacturer.

What is a Cardinality Estimate?

(MS SQL Statistics) A cardinality estimate is the estimated number of rows, the optimizer believes will be returned by a specific operation in the execution plan.

Statistics options

There are options that affect when and how statistics are created and updated. These options are configurable at the database level only.

1) AUTO_CREATE_STATISTICS option:

When the automatic create statistics option, AUTO_CREATE_STATISTICS is ON, the Query Optimizer creates statistics on individual columns in the query predicate, as necessary, to improve cardinality estimates for the query plan. These single-column statistics are created on columns that don’t already have a histogram in an existing statistics object.

The AUTO_CREATE_STATISTICS option does not determine whether statistics get created for indexes. This option also does not generate filtered statistics. It applies strictly to single-column statistics for the full table. When the Query Optimizer creates statistics as a result of using the AUTO_CREATE_STATISTICS option, the statistics name starts with _WA.

2) AUTO_UPDATE_STATISTICS option

a) AUTO_UPDATE_STATISTICS is ON

When the automatic update statistics option, AUTO_UPDATE_STATISTICS is ON, the Query Optimizer determines when statistics might be out-of-date and then updates them when they are used by a query. This action is also known as statistics recompilation. Statistics become out-of-date after modifications from the insert, update, delete, or merge operations change the data distribution in the table or indexed view.

The Query Optimizer determines when statistics might be out-of-date by counting the number of row modifications since the last statistics update and comparing the number of row modifications to a threshold. The threshold is based on the table cardinality, which can be defined as the number of rows in the table or indexed view.

b) AUTO_UPDATE_STATISTICS option is OFF

Marking statistics as out-of-date based on row modifications occurs even when the AUTO_UPDATE_STATISTICS option is OFF. When the AUTO_UPDATE STATISTICS option is OFF, statistics are not updated, even when they are marked as out-of-date. Plans will continue to use the out-of-date statistics objects.

Setting AUTO_UPDATE_STATISTICS to OFF can cause suboptimal query plans and degraded query performance. Setting the AUTO_UPDATE STATISTICS option to ON is recommended. The Query Optimizer checks for out-of-date statistics before compiling a query and before executing a cached query plan. Before compiling a query, the Query Optimizer uses the columns, tables, and indexed views in the query predicate to determine which statistics might be out-of-date. Before executing a cached query plan, the Database Engine verifies that the query plan references up-to-date statistics.

The AUTO_UPDATE_STATISTICS option applies to statistics objects created for indexes, single-columns in query predicates, and statistics created with the CREATE STATISTICS statement. This option also applies to filtered statistics. You can use the sys.dm_db_stats_properties to accurately track the number of rows changed in a table and decide if you wish to update statistics manually. (MS SQL Statistics)

AUTO_UPDATE_STATISTICS is always OFF for memory-optimized tables.

When to create statistics?

The Query Optimizer already creates statistics in the following ways:

  • The Query Optimizer creates statistics for indexes on tables or views when the index is created. These statistics are created on the key columns of the index.
  • The Query Optimizer creates statistics for single columns in query predicates when AUTO_CREATE_STATISTICS is on.
    Consider creating statistics with the CREATE STATISTICS statement when any of the following applies:
    • The Database Engine Tuning Advisor suggests creating statistics.
    • The query predicate contains multiple correlated columns that are not already in the same index.
    • The query selects from a subset of data.
    • The query has missing statistics.

The query identifies missing statistics

If an error or other event prevents the Query Optimizer from creating statistics, the Query Optimizer creates the query plan without using statistics. The Query Optimizer marks the statistics as missing and attempts to regenerate the statistics the next time the query is executed. Missing statistics are indicated as warnings (table name in red text) when the execution plan of a query is graphically displayed using SQL Server Management Studio.

Additionally, monitoring the Missing Column Statistics event class by using SQL Server Profiler indicates when statistics are missing. If statistics are missing, perform the following steps:

  1. Verify that AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS are ON.
  2. Verify that the database is not read-only. If the database is read-only, a new statistics object cannot be saved.
  3. Create the missing statistics by using the CREATE STATISTICS statement.

How to check statistics in MS SQL Server?

1) DBCC SHOW_STATISTICS

DBCC SHOW_STATISTICS displays current query optimization statistics for a table or indexed view. The query optimizer uses statistics to estimate the cardinality or number of rows in the query result, which enables the Query Optimizer to create a high-quality query plan.

For example, the Query Optimizer could use cardinality estimates to choose the index seek operator instead of the index scan operator in the query plan, improving query performance by avoiding a resource-intensive index scan.

The Query Optimizer stores statistics for a table or indexed view in a statistics object. For a table, the statistics object is created on either an index or a list of table columns. The statistics object includes a header with metadata about the statistics, a histogram with the distribution of values in the first key column of the statistics object, and a density vector to measure cross-column correlation.

The Database Engine can compute cardinality estimates with any of the data in the statistics object. DBCC SHOW_STATISTICS displays the header, histogram, and density vector based on data stored in the statistics object. The syntax lets you specify a table or indexed view along with a target index name, statistics name, or column name. (MS SQL Statistics)

2) STATS_DATE 

Returns the date of the most recent update for statistics on a table or indexed view.

USE AdventureWorks2012; 
GO 
SELECT name AS stats_name,  
    STATS_DATE(object_id, stats_id) AS statistics_update_date 
FROM sys.stats  
WHERE object_id = OBJECT_ID(‘Person.Address’); 

3) sys.dm_db_stats_properties

Returns properties of statistics for the specified database object (table or indexed view) in the current SQL Server database.

SELECT * FROM sys.dm_db_stats_properties (object_id(‘Person.Person’), 1);

How to update statistics?

SP_UPDATESTATS

sp_updatestats executes UPDATE STATISTICS, by specifying the ALL keyword, on all user-defined and internal tables in the database. sp_updatestats displays messages that indicate its progress. When the update is completed, it reports that statistics have been updated for all tables. sp_updatestats updates statistics on disabled nonclustered indexes and does not update statistics on disabled clustered indexes.

For disk-based tables, sp_updatestats updates statistics based on the modification_counter information in the sys.dm_db_stats_properties catalog view, updating statistics where at least one row has been modified. Statistics on memory-optimized tables are always updated when executing sp_updatestats. Therefore do not execute sp_updatestats more than necessary. sp_updatestats can trigger a recompile of stored procedures or other compiled code. However, sp_updatestats might not cause a recompile, if only one query plan is possible for the tables referenced and the indexes on them. (MS SQL Statistics)

A recompilation would be unnecessary in these cases even if statistics are updated.

USE AdventureWorks2012; 
GO 
EXEC sp_updatestats;  

Conclusion:

By understanding the power of MS SQL statistics and implementing best practices, you equip your database with the knowledge it needs to optimize query execution. This translates to faster performance, efficient resource utilization, and a smoother user experience for your applications. Remember, maintaining up-to-date and well-designed statistics is a crucial step towards maximizing the potential of your MS SQL Server.


FAQ:

What are MS SQL Statistics?

MS SQL Statistics are collections of data that analyze the distribution of values within tables and indexed views in Microsoft SQL Server. Think of them as summaries of your data that help the server understand what kind of information it’s working with. The SQL Server query optimizer uses these statistics to estimate how many rows a query will return (cardinality). This allows it to choose the most efficient execution plan for your queries, ultimately improving performance.

Why are MS SQL Statistics important?

Accurate and up-to-date statistics are crucial for optimal query performance in MS SQL Server. Inaccurate statistics can mislead the query optimizer into choosing inefficient plans, leading to slow queries and wasted resources. For instance, the optimizer might choose a full table scan instead of a faster index seek if it has outdated statistics.

How can I update MS SQL Statistics?

MS SQL Server can automatically create and update statistics when you create indexes or certain queries are run. However, it’s good practice to manually update statistics periodically, especially after significant data changes occur in your tables. You can use the UPDATE STATISTICS statement in Transact-SQL (T-SQL) to update statistics for specific tables, columns, or indexes. There are also options for scheduling automatic updates as part of SQL Server maintenance plans.

Leave a comment