Essential MS SQL Server Performance Counter

For list of others essential performance counters for applications based on Windows platform see here.


When there is a requirement of collecting performance counters during performance testing, it becomes tough to select few counters from existing many.  All counters are important in its context and cannot be simply ignored.  So, it is very important to recheck the list of counters that needs to be monitored before every performance run.  You might need to change the counters as per yours requirements.

Following are some of the MS SQL Server counters, that I find bare essential to start with though.  The list of counters and its preferred  / threshold values are being collected from multiple relevant blogs / links and have been mentioned in the ‘References’ section below.

Performance Object
Performance Counter
Preferred Value
Access Methods
Forwarded records / sec
< 10 out of 100 batch requests / sec
Full scans / sec
(Index Searches / sec) / (Full scans / sec) > 1000
Index searches / sec
(Index Searches / sec) / (Full scans / sec) > 1000
Page splits / sec
< 20 per 100 Batch requests / sec
Table lock escalations / sec
None
Workfiles created / sec
None
Worktables created / sec
< 20
Buffer Manager
Buffer cache hit ratio
> 90%
Checkpoint pages / sec
< 300
Free list stalls / sec
< 2
Free pages
> 640
Lazy writes / sec
< 20
Page life expectancy
> 300
Page lookups / sec
(Page lookups / sec) / (Batch requests / sec) < 100
Page reads / sec
< 90
Page writes / sec
< 80
Databases
Log growths
None
Transactions / sec
None
General Statistics
Logins / sec
< 2
Logouts / sec
< 2
User connections
None
Latches
Average latch wait time (ms)
< 1
Latches waits / sec
(Total Latch Wait Time) / (Latch Waits / Sec) < 10
Total latch wait time (ms)
(Total Latch Wait Time) / (Latch Waits / Sec) < 10
Locks
Average wait time (ms)
< 500
Lock requests / sec
< 1000
Lock timeouts (timeouts > 0) / sec
Nil
Lock wait time (ms)
None
Lock waits / sec
Nil
Number of deadlocks / sec
Nil
Cache Manager
Cache hit ratio
> 90%
SQL Errors
Errors / sec
Nil
SQL Statistics
Batch requests / sec
< 1000
SQL compilations / sec
< 10% of batch requests / sec
SQL recompilations / sec
< 10% of SQL compilations / sec



Access Methods

Forwarded records / sec
It indicates number of records fetched through forwarded record pointers.
Preferred Value:  Less than 10 out of 100 batch requests / sec
Threshold Value:  See preferred value
Notes:
  1. High value of this counter can be avoided by using default values, using char instead of varchar i.e. variable size string.
  2. Tables with NO clustered index can fetch records through forwarded record pointers. If you start out with a short row, and update the row creating a wider row, the row may no longer fit on the data page. A pointer will be put in its place and the row will be forwarded to another page.
  3. Adding clustered index is simplest way to eliminate problem
  4. Occur when row / record moved from one database page to another because changed record cannot fit back in original page

Full scans / sec
It indicates the number of unrestricted full scans. These can either be base table or full index scans.
Preferred Value:  (Index Searches / sec) / (Full scans / sec) > 1000
Threshold Value:  See preferred value
Notes:
  1. It indicates Unrestricted linear searches through table or index for example "SELECT * FROM TABLE"
  2. SQL Query "Estimated Execution Plan" can identify them ahead of time.
  3. SQL Query "Actual Execution Plan" and SQL Profiler (trace) can identify them when they occur
  4. Compare Full scans / sec with Forwarded records / sec
  5. Performance can be reduced because of too high disk access.
  6. If we see high CPU then we need to investigate this counter, otherwise if the full scans are on small tables we can ignore this counter.
  7. Few of the main causes of high full scans / sec are Missing indexes, Too many rows requested
Index searches / sec
It indicates number of index searches.
Preferred Value:  (Index Searches / sec) / (Full scans / sec) > 1000
Threshold Value:  See preferred value
Notes:
  1. Index searches are used to start range scans, single index record fetches, and to reposition within an index.
Page splits / sec
It indicates number of page splits per second that occur as a result of overflowing index pages.
Preferred Value:  < 20 per 100 Batch requests / sec
Threshold Value:  See preferred value
Notes:
  1. It can be avoided by having proper fill factor.
  2. Interesting counter that can lead us to our table / index design.  This value needs to be low as possible
  3. If you find that there is a very strong correlation between page splits and excessive disk I/O activity, then most likely you have a page splitting problem.
  4. The problem can be remedied by Proper fill factor, Rebuild yours indexes more often, Get a faster I/O subsystem
Table lock escalations / sec
It indicates the number of times locks on a table were escalated.
Preferred Value:  None
Threshold Value:  None
Notes:
  1. Row, key or page locks automatically escalated to coarser table locks as appropriate
  2. A high number needs revisit to the query and the indexes on the table
Workfiles created / sec
It indicates number of work files created per second.
Preferred Value:  None
Threshold Value:  None
Notes:
  1. Work files could be used to store temporary results for hash joins and hash aggregates.
  2. It helps in identifying performance issues related to tempDB growth
Worktables created / sec
It indicates number of work tables created per second.
Preferred Value:  Less than 20
Threshold Value:  20 or more
Notes:
  1. Work tables could be used to store temporary results for query spool, LOB variables, XML variables, and cursors.
  2. It can be used to monitor tempDB growth.
  3. Temporary tables created in tempdb by SQL Server to perform a sort, union or group by operation.


Buffer Manager


Buffer cache hit ratio
It indicates the percentage of pages that were found in the buffer pool without having to incur a read from disk.
Preferred Value:  Greater than 90%
Threshold Value:  85% or less
Notes:
  1. When this percentage is high, your server is operating at optimal disk I/O efficiency.  If this value decreases over time, you might consider adding physical memory to the server.
  2. If the DBA is seeing low readings for the buffer cache hit ratio, the Page Life Expectancy statistic should be checked.
Checkpoint pages / sec
It indicates number of pages flushed by checkpoint or other operations that require all dirty pages to be flushed.
Preferred Value:  < 300
Threshold Value:  300 or more
Notes:
  1. When a checkpoint occurs, all dirty pages are written to disk. This is a normal procedure and will cause this counter to rise during the checkpoint process. What you don't want to see is a high value for this counter over time. This can indicate that the checkpoint process is running more often than it should, which can use up valuable server resources. If this has a high figure (and this will vary from server to server), consider adding more RAM to reduce how often the checkpoint occurs, or consider increasing the "recovery interval" SQL Server configuration setting.
  2. This value is relative, it varies from server to server, we need to compare the average to a base line capture to tell if the value is high or low.
Free list stalls / sec
It indicates the number of requests that had to wait for a free page.
Preferred Value:  < 2
Threshold Value:  3 or more
Notes:
  1. It is the frequency with which requests for available database pages are suspended because no buffers are available.
Free pages
It indicates total number of pages on all free lists.
Preferred Value:  > 640
Threshold Value:  Consistently close to zero
Notes:
  1. It is an Indicator of insufficient SQL server memory
  2. If Free Pages shows an average number that is rather poor, but none of the other counters that tracks memory show a bad number, then you’re a-okay.
Lazy writes / sec
It indicates number of buffers written by buffer manager's lazy writer.
Preferred Value:  < 20
Threshold Value:  20 or more
Notes:
  1. This counter tracks how many times a second that the Lazy Writer process is moving dirty pages from the buffer to disk in order to free up buffer space. Generally speaking, this should not be a high value, say more than 20 per second or so. Ideally, it should be close to zero. If it is zero, this indicates that your SQL Server's buffer cache is plenty big and SQL Server doesn't have to free up dirty pages, instead waiting for this to occur during regular checkpoints. If this value is high, then a need for more memory is indicated
Page life expectancy
It indicates number of seconds a page will stay in the buffer pool without references.
Preferred Value:  Greater than 300
Threshold Value:  300 or less
Notes:
  1. If the value is less it indicates SQL cache is cold, Memory problems, Missing indexes etc.
Page lookups / sec
It measures number of times database attempted to find a page in buffer pool.
Preferred Value:  (Page lookups / sec) / (Batch requests / sec) < 100
Threshold Value:  See Preferred Value
Notes:
  1. It performs a logical read rather than the physical read.
  2. Useful for corroborating and further quantifying buffer cache hit ratio.
  3. Compare page reads / sec with page lookups / sec.
  4. When the value much greater than preferred value then it is an indication that while query plans are looking up data in the buffer pool, these plans are insufficient.  Identify queries with the highest amount of logical I/Os and tune them.
Page reads / sec
It indicates number of physical database page reads issued.
Preferred Value:  Less than 90
Threshold Value:  See preferred value
Notes:
  1. It measures physical I/O and not logical I/O.
  2. High value of this counter might indicates Insufficient database memory, Application improperly accessing database, Improper database table implementation or indexing or memory constraints
Page writes / sec
It indicates number of physical database page writes issued.
Preferred Value:  Less than 80
Threshold Value:  90 or more
Notes:
  1. 80 – 90 per second is normal, anything more we need to check the lazy writer/sec and checkpoint counters, if these counters are also relatively high then, it’s memory constraint.


Databases


Log growths
It indicates total number of log growths for this database.
Preferred Value:  None
Threshold Value:  None
Notes:
  1. If there is lot of activity in this counter we need to allocate static and large enough space for our log files.
Transactions / sec
It indicates the number of transactions started for the database
Preferred Value:  None
Threshold Value:  None
Notes:
  1. It indicates the average utilization of the server


General Statistics


Logins / sec
It indicates total number of logins started per second.
Preferred Value:  Less than 2
Threshold Value:  2 or more
Notes:
  1. Greater than 2 per second indicates that the application is not correctly using 'Connection Pooling'

Logouts / sec

It indicates total number of logouts started per second.
Preferred Value:  Less than 2
Threshold Value:  2 or more
Notes:
  1. Greater than 2 per second indicates that the application is not correctly using 'Connection Pooling'
User Connections
It indicates the number of users connected to the system.
Preferred Value:  None
Threshold Value:  None
Notes:
  1. It is recommended to review this counter with "Batch Requests / sec".  A surge in "user connections" may result in surge of "Batch Requests / sec"
  2. With a blocking problem, for example you might see user connections, lock waits and lock wait time all increase while batch requests per second decreases.


Latches


Average latch wait time (ms)
It indicates average latch wait time (milliseconds) for latch requests that had to wait.
Preferred Value:  Less than 1
Threshold Value:  1 or more
Notes:
  1. It indicate large number of physical I/Os or long I/O times
  2. If value greater than 1 then often coincide with low buffer cache hit ratio
  3. If this number is high, yours server might have resource limitations.
Latches waits / sec
It indicates number of latch requests that could not be granted immediately and had to wait before being granted.
Preferred Value:  (Total Latch Wait Time) / (Latch Waits / Sec) < 10
Threshold Value:  See Preferred Value
Notes:
  1. None
Total latch wait time (ms)
It indicates total latch wait time (milliseconds) for latch requests that had to wait in the last second.
Preferred Value:  (Total Latch Wait Time) / (Latch Waits / Sec) < 10
Threshold Value:  See Preferred Value
Notes:
  1. None

Locks

Average wait time (ms)
It indicates the average amount of wait time (milliseconds) for each lock request that resulted in a wait.
Preferred Value:  Less than 500
Threshold Value:  500 or more
Notes:
  1. Waiting hints for performance issues, lower the value the better it is.
  2. If the value goes higher then 500, there may be blocking going on; we need to run blocker script to identify blocking.
  3. A high value indicates Memory Pressure, IO Subsystem problem, Improper Indexes, Improper SQL Structures, Improper placement of files in the disk subsystem, Incorrect usage of SQL isolation level
Lock requests / sec
It indicates the number of new locks and lock conversions requested from the lock manager
Preferred Value:  Less than 1000
Threshold Value:  1000 or more
Notes:
  1. It indicates the number of requests for a type of lock per second. Lock requests/sec > 1000 indicates that the queries are accessing large number of rows; the next step is to review high read queries. If you also see high Avg. Wait time, then it’s an indication of blocking, then review the blocking script output.
Lock timeouts (timeouts > 0) / sec
It indicates number of lock requests that timed out. This does not include requests for NOWAIT locks.
Preferred Value:  Nil
Threshold Value:  None
Notes:
  1. It indicates # of lock requests that exceed maximum specified wait time.
Lock wait time (ms)
It indicates total wait time (milliseconds) for locks in the last second
Preferred Value:  None
Threshold Value:  None
Notes:
  1. For “Lock Wait Time” it is recommended to look beyond the Avg value.  Look for any peaks that are close (or exceeds) to a wait of 60 sec.
  2. Though this counter counts how many total milliseconds SQL Server is  waiting on locks during the last second, but the counter actually records  at the end of locking event.  So most probably the peaks represent one huge locking event.  If those events exceeds more than 60seconds then they may have extended blocking and could be an issue. In such cases, thoroughly analyze the blocking script output. Some applications are written for timing out after 60 seconds and that’s not acceptable response for those applications.
Lock waits / sec
It indicates number of lock requests that could not be satisfied immediately and required the caller to wait before being granted the lock.
Preferred Value:  0
Threshold Value:  More than 0
Notes:
  1. This counter reports how many times users waited to acquire a lock over the past second.
  2. If this value is nonzero then it is an indication that there is at least some level of blocking occurring.  If you combine this with the Lock Wait Time counter, you can get some idea of how long the blocking lasted.
  3. Correlate this counter with Lock wait time (ms)
Number of deadlocks / sec
It indicates number of lock requests that resulted in a deadlock
Preferred Value:  0
Threshold Value:  Greater than zero
Notes:
  1. SQL profiler can provide information about how deadlock was created.


Cache Manager


Cache Hit Ratio
It indicates ratio between cache hits and lookups
Preferred Value:  Greater than 90%
Threshold Value:  Less than 85%
Notes:
  1. Lower values indicate too many ad-hoc queries
  2. Lower values often associated with high values of SQL compilations / sec and SQL re-compilations / sec


SQL Errors


Errors / sec
It indicates number of errors per second.
Preferred Value:  Nil
Threshold Value:  None
Notes:
  1. Monitoring errors always provide quality aspects of application.


SQL Statistics


Batch requests / sec
It indicates the number of SQL batch requests received by server
Preferred Value:  Less than 1000
Threshold Value:  1000 or more
Notes:
  1. This counter is required for other correlation (e.g. SQL compilations / sec).
  2. Batch requests / sec can be compared with System Context Switches / sec to highlight need for SQL server connection affinity
  3. Generally over 1000 batch requests / sec, indicates a very busy SQL server
  4. From a network bottleneck approach, a typical 100Mbps network card is only able to handle about 3000 batch requests / sec.  If you have a server that is this busy, you may need to have 2 or more network cards.
  5. Sometimes low batch requests/sec can be misleading.  If there were a SQL statements/sec counter, this would be a more accurate measure of the amount of SQL Server activity.  For example, an application may call only a few stored procedures yet each stored procedure does lot of work.  In that case, we will see a low number for batch requests/sec but each stored procedure (one batch) will execute many SQL statements that drive CPU and other resources.
SQL compilations / sec
It indicates number of SQL compilations
Preferred Value:  Less than 10% of total Batch Requests / sec
Threshold Value:  40% of total Batch Requests / sec
Notes:
  1. If stored procedure requested after removal or invalidation, it is recompiled.
  2. High compilation rates frequently corresponds with lower Cache Manager: Cache Hit Ratios
  3. High compilation rates indicates lack of stored procedure usage
  4. High compilation rates indicates possible memory shortage  
  5. If you see a high value such as over 100, then it’s an indication that there are lots of adhoc queries that are running, might cause CPU usage, solution is to re-write these adhoc as stored procedure or use sp_executeSQL
SQL Re-compilations / sec
It indicates number of SQL re-compilations
Preferred Value:  Nil
Threshold Value:  10 % or more of SQL compilations / sec
Notes:
  1. A recompile can cause deadlocks and compile locks that are not compatible with any locking type.


References
8.  Forum




Comments

Popular posts from this blog

Performance Test Run Report Template

Bugs Management in Agile Project

Understanding Blockchain