Essential Windows Performance Counters

For list of others essential performance counters for applications based on Windows platform see here.


When there is a requirement of collecting performance counters during performance testing, it becomes tough to select few counters from existing many.  All counters are important in its context and cannot be simply ignored.  So, it is very important to recheck the list of counters that needs to be monitored before every performance run.  You might need to change the counters as per yours requirements.


Following are some of the windows counters, that I find bare essential to start with though.  The list of counters and its preferred  / threshold values are being collected from multiple relevant blogs / links and have been mentioned in the ‘References’ section below.


Performance Object
Performance Counter
Preferred Value
Logical Disk
% Free Space
> 10%
Memory
Available Bytes
> 20% of Physical RAM
% Committed Bytes in Use
< 80%
Page Reads / sec
< 5
Pages Writes / sec
As low as possible
Physical Disk
% Disk Time
< 80%
Avg. Disk Queue Length
None
Current Disk Queue Length
(Queue Length - # Spindles on the disks) < 3
Disk Bytes / sec
As high as possible
Disk Transfers / sec
< 50 per physical disk
Split IO / sec
None
Process
% Processor Time
< 70%
Private Bytes
< 60% of RAM
Working Set
None
Virtual Bytes
None
Processor
% Interrupt Time
< 50%
% Privileged Time
None
% Processor Time
< 80%
Interrupts/sec
< 1000
System
Context Switches/sec
1500 – 3000 per processor
Processor Queue Length
< 3 * (Number of Processors)
System Calls/sec
None
Network Interface
Bytes Total / sec
< 50% of Current Bandwidth of NIC


Logical Disk

% Free Space
It reports the percentage of unallocated disk space to the total usable space on the logical volume.
Preferred Value:  > 10%
Threshold Value:  < 5%
Notes:
  1. It is advisable to monitor all logical volumes than _Total instance.  For example, C drive gets filled by some default log files or historical data and slows down the overall system.
  2. There is no % Free Space counter for the Physical Disk object, so in case it is required to collect and monitor ‘Free Space’ of physical disk than logical volumes than _Total instance can be used.

Memory

Available Bytes
It shows the amount of physical memory (in bytes) immediately available for allocation to processes or for system use.
Preferred Value:  > 20% of physical RAM
Threshold Value:  < 4 MB for sustained period of time
Notes:
  1. As the available counter decreases, paging increases.
  2. If the Committed Bytes don’t decrease and Available Bytes don’t increase, when the program is closed then the system may have memory leak.
 % Committed Bytes in Use
It shows the ratio of Committed Bytes to Commit Limit.
Preferred Value:  < 80%
Threshold Value:  < 90%
Notes:
  1. If Available Bytes are decreasing and % Committed Bytes in Use is increasing and doesn’t return back to initial state then it might suggests memory leaks in a program.


Page Reads / sec
It shows the rate, in incidents per second, at which the disk was read to resolve hard page faults.
Preferred Value:  Less than 5
Threshold Value:  Greater than 5 for sustained period of time
Notes:
  1. It is the primary indicator of memory shortage.
  2. It shows the number of read operations, without regards the number of pages retrieved in each operation.  For finding number of pages reads per operation, monitor Pages Input / sec as well.
Pages Writes / sec
Similar to Page Reads/sec, this counter indicates how many times the disk was written to in an effort to clear unused items out of memory.
Preferred Value:  As low as possible
Threshold Value:  None
Notes:
  1. Increasing values in this counter often indicate a building tension in the battle for memory resources.
  2. It should always be reported with Page Reads / sec to corroborate the finding of memory constraint.
  3. Pages are written to disk only if they are changed while in physical memory, so they are likely to hold data and not code.

 Physical Disk


% Disk Time
It indicates the percentage of time the hard disk is busy.
Preferred Value:  Less than 80%
Threshold Value:  80%
Notes:
  1. It indicates overall disk utilization.
  2. An average value of 80% or above indicates that the hard disk can’t keep up with the demand.  It could be due to a hard disk that is too slow, or it could be caused by excessive paging.
  3. If it’s busy almost all the time, and there is a large queue, the disk might be a bottleneck.
Avg. Disk Queue Length
This counter displays %Disk Time as a decimal with no defined maximum. (A %Disk Time of 100% equals an Avg. Disk Queue Length of 1.0.)
Preferred Value:  None
Threshold Value:  None
Notes:
  1. This counter will be needed in times when the disk configuration employs multiple controllers for multiple physical disks. In these cases, the overall performance of the disk I/O system, which consists of two controllers, could exceed that of an individual disk. Thus, if you were looking at the %Disk Time counter, you would only see a value of 100%, which wouldn't represent the total potential of the entire system, but only that it had reached the potential of a single disk on a single controller. The real value may be 120% which the Avg. Disk Queue Length counter would display as 1.2.

Current Disk Queue Length
It is an indication of the number of transactions that are waiting to be processed.
Preferred Value:  (Queue Length - # Spindles on the disks) should be less than 3
Threshold Value:  (Queue Length - # Spindles on the disks) should be greater than 3 for sustained period of time
Notes:
  1. Even if % Disk Time is 100% but if Current Disk Queue Length is less than it means disk is being utilized properly.
  2. The queue is an important measure for services that operate on a transaction basis. Just like the line at the supermarket, the queue is representative of not only the number of transactions, but also the length and frequency of each transaction.
Disk Bytes / sec

It indicates the rate at which bytes are transferred and is the primary measure of disk throughput.
Preferred Value:  As high as possible
Threshold Value:  None
Notes:
  1. It is the primary measure of disk throughput.
  2. It answers the question, how fast is data being moved (in bytes)?
Disk Transfers / sec
It measures the number of read and writes completed per second, regardless of how much data they involve.
Preferred Value:  Less than 50 per physical disk
Threshold Value:  50 per physical disk
Notes:
  1. It answers the question, how fast are data requests being serviced?
  2. It measures disk utilization
Split IO / sec
It shows the rate, in incidents per second, at which input/output (I/O) requests to the disk were split into multiple requests.
Preferred Value:  None
Threshold Value:  None
Notes:
  1. A split I/O might result from requesting data in a size that is too large to fit into a single I/O, or from a fragmented disk subsystem.
  2. On single-disk systems, a high rate for this counter tends to indicate disk fragmentation.

Process

% Processor Time
It provides processor utilization information by a process.
Preferred Value:  Less than 70%
Threshold Value:  70%
Notes:
  1. If one process is utilizing more than 70% of processor then it means its not giving any time to other processes.
Private Bytes
It shows the size, in bytes, that this process has allocated that cannot be shared with other processes.
Preferred Value:  Less than 60% of RAM
Threshold Value:  60% of RAM
Notes:
  1. If a memory leak is occurring, this value will tend to steadily rise.
Working Set
The working set is the set of memory pages that were touched recently by the threads in the process.
Preferred Value:  None
Threshold Value:  None
Notes:
  1. If free memory in the computer is above a threshold, pages are left in the working set of a process, even if they are not in use.
  2. When free memory falls below a threshold, pages are trimmed from working sets. If the pages are needed, they will be soft-faulted back into the working set before leaving main memory.
  3. The size of the working set will grow and shrink as the VMM can permit. When memory is becoming scarce the working sets of the applications will be trimmed. When memory is plentiful the working sets are allowed to grow.
  4. Larger working sets mean more code and data in memory making the overall performance of the applications increase. However, a large working set that does not shrink appropriately is usually an indication of a memory leak.
 Virtual Bytes
It shows the size, in bytes, of the virtual address space that the process is using.
Preferred Value:  None
Threshold Value:  None
Notes:
  1. If the program with the memory leak is allocating virtual memory in its own address space, the memory leak should be evident by tracking per process Virtual Bytes Counter. If the amount of Virtual Bytes allocated for a process increases steadily over the life of a process, there is good reason to suspect a leak.

 Processor

% Interrupt Time
It shows the percentage of time that the processor spent receiving and servicing hardware interrupts during the sample interval.
Preferred Value:  Less than 50 %
Threshold Value:  50 %
Notes:
  1. If this value exceeds 50% of the processor time then there might be some hardware issues present.
% Privileged Time
It is the amount of time the processor was busy with Kernel mode operations.
Preferred Value:  None
Threshold Value:  None
Notes:
  1. If the processor is very busy and this mode is high, it is usually an indication of some type of NT service having difficulty
% Processor Time
It shows the percentage of elapsed time that this thread used the processor to execute instructions.
Preferred Value:  Less than 80 %
Threshold Value:  80 %
Notes:
  1. It provides processor utilization information.
  2. Correlate it with System \ Processor Queue Length to make sure that when utilization is more than 80% then queue is also getting developed otherwise there is no issue with processor utilization.
Interrupts/sec
It shows the numbers of interrupts the processor was asked to respond to.
Preferred Value:  Less than 1000
Threshold Value:  Greater than 1000 for sustained period of time
Notes:
  1. A sustained value over 1000 is usually an indication of a problem. Problems would include a poorly configured drivers, errors in drivers, excessive utilization of a device (like a NIC on an IIS server), or hardware failure.
  2. Compare this value with the System: Systems Calls/sec. If the Interrupts/sec is much larger over a sustained period, then probably there is a hardware issue present in the system.

 System

Context Switches/sec
It indicates the rate at which the processors switch thread contexts.
Preferred Value:  1500 – 3000 per processor
Threshold Value:  Greater than 6000 per processor
Notes:
  1. A high number may indicate high lock contention or transitions between user and kernel mode.
  2. A high context-switch rate often indicates that there are too many threads competing for the processors on the system.
  3. Context Switches/sec should increase linearly with throughput, load, and the number of CPUs. If it increases exponentially, there is a problem.
  4. Context switches occur when a running thread voluntarily relinquishes the processor, or is preempted by a higher priority, ready thread.
  5. In case of IIS, compare this value with of Web Service \ Total Method Requests / sec.  Context switch per request (Context switches / sec divided by Total methods Requests / sec) should be low.
Processor Queue Length
It indicates how many threads are ready, but have to wait for a processor.
Preferred Value:  Less than 3 * (Number of Processors)
Threshold Value:  Greater than 3 * (Number of Processors) for sustained period of time
Notes:
  1. There is a single queue for processor time, even on computers with multiple processors. Therefore, if a computer has multiple processors, you need to divide this value by the number of processors servicing the workload.
  2. Unlike disk queue counters, it counts only waiting threads, not those being serviced.
  3. A sustained processor queue of greater than two threads generally indicates processor congestion.

System Calls/sec
It measures the number of calls made to the system components, Kernel mode services.
Preferred Value:  None
Threshold Value:  None
Notes:
  1. Indirectly it measure, how busy the system is taking care of applications and services—software stuff.
  2. When compared to the Interrupts/Sec it will give you an indication of whether processor issues are hardware or software related.

Network Interface


Bytes Total/sec
It indicates the rate at which bytes are sent and received over each network adapter, including framing characters.
Preferred Value:  Less than 50% of Current Bandwidth of NIC
Threshold Value:  50% of Current Bandwidth of NIC
Notes:
  1. It tells overall how much information is going in and out of the interface.
  2. On identification of NIC bottleneck, Bytes Received / sec & Bytes Sent / sec can be tracked later.

References:

  1. Windows Server 2003 Performance Testing
  2. How to find memory leak using performance monitor?
  3. Detecting memory leak
  4. Detecting processor bottleneck
  5. Monitoring context switches
  6. Detecting disk bottlenecks
  7. Getting Started
  8. Network Health
  9. Keep tabs on yours network traffic
  10. Performance monitor counters
  11. Working with disk counters
  12. Monitoring activities on multiple processor system
  13. Windows page file and SQL server
  14. ASP.NET performance monitoring
  15. Key performance counters and their thresholds
  16. Suggested performance counters to watch
  17. Detecting cache bottleneck
  18. RAM, virtual memory and page file, all that stuff



Comments

Popular posts from this blog

Performance Test Run Report Template

Understanding Blockchain

Bugs Management in Agile Project