Introduction
One of the most common issues seen in the performance field is the
attempted use of system wide aggregated statistics to diagnose
application performance issues. When a production system performance
issue is reported resist the temptation to study system wide
statistics and extrapolate, particularly if they are historic smoothed
averages with no indication of standard deviation (e.g. historic sar
data taken by five minute samples in cron). It is usually far more
fruitful to identify the thread of execution which is the basis of the
complaint (e.g. the report is running slowly, my screen is updating
slowly, etc.). Once the performance complaint has been translated into
a thread of execution then a deterministic profiling method should be
used to produce a performance profile of the problematic thread of
execution.
Truss & Strace
The simplest tools for doing this are truss (Solaris) and strace
(Linux). Later versions of both tools can also provide information
about user space library calls. Caution should be exercised when
tracing user libraries, as the information can be misleading, as these
tools do not separate calls between library routines (i.e. if one
library calls another the time will be allotted only to the first
library call). To greater and lesser extents these tools can also
introduce the Heisenberg effect (effecting what you are observing), to
get an indication of the effect of the tool try timing an application
run with and without the tool. Most of this tools use interposing,
breakpoints or watchpoints, all of which can have a noticeable effect
on execution flow, as they either single step code, introduce
significant additional code in the interposing path or require complex
memory checks (this may change in the future as most modern processors
have specialised debug registers for breakpoints and watchpoints).
Dtrace & Systemtap
Whilst these tools are useful far more useful tools have been
delivered or are in development, which help with this task. Sun
Microsystems have delivered the excellent and proven dtrace tool in
their Solaris operating environment. Dtrace provides a scripting
language that activates dynamic probes in both user space and kernel
space allowing for the exact flow of execution to be traced from user
space into kernel space. This is exactly what is required for
Production performance issues. Currently dtrace is being developed in
many interesting areas, including the ability to trace various dynamic
languages (Java bytecode, Python, Perl, etc) and interesting futures
such as the integration of hardware counters into the dtrace
framework. Various vendors are collaborating on a similar tool for
Linux called Systemtap but the project is in its infancy, although it
is evolving quickly, it has yet to be proven Production safe and this
is likely to be its biggest challenge. Dtrace and Systemtap are
referred to as deterministic profilers meaning they instrument each
function call and are extremely lightweight probes when enabled,
allowing for almost the exact execution path to be profiled.
Oprofile
Linux has a more mature tool known as Oprofile that can be used safely
on Production systems. Oprofile is a statistical profiler, meaning it
profiles by taking samples. Various hardware counters can be set, each
with a sample rate. When the counter reaches the set sample point an
NMI (Non Maskable Interrupt) is generated and the handler samples the
program counter. The use of NMIs means that code with interrupts
disabled can be accurately profiled. These samples are the collected
in user space and can be used to generate useful reports. To profile a
thread of execution a time source is generally sampled and this
provides a statistical sample of what each thread of execution on the
system was doing, this data can then be manipulated to isolate the
thread of execution of interest using the reporting tools. Currently
Oprofile is being extended to begin to understand dynamic code such as
Java. The only issue with Oprofile is that it does not easily identify
sleep states such as waiting for I/O or locks (unless they are
spinning). In spite of this drawback Oprofile is by far the best
application profiling tool available in Linux, particularly if your
application is CPU bound.
Conclusions
Whilst Linux has a mature tool in Oprofile and a promising tool in
Systemtap, Solaris probably wins the observability stakes currently
due to dtrace and its proven stability/track record, as well as its
future roadmap. Finally, to conclude, when are systemic statistics of
most use (i.e. sar and friends), mainly for pre-production tuning and
during benchmarking of performance simulations where multiple system
elements must be optimised together, as well as for capacity planning
exercises.