This month, I’ll present a few system tools that can be helpful when trying to diagnose your Linux system’s health, improve performance, and so on. This installment is intended for users who are newer to Linux, and who might not be familiar with, or aware of all the utilities that are already available at their fingertips.
I often feature tools that are not included by default on a Linux system in the Tool of the Month column, but this installment of Open Road will present some utilities that are part of a “standard” Linux install, or at least packaged and available for most Linux distros — whether they’re actually installed by default, or not. I’m specifically thinking of Debian here, since a minimal Debian install won’t include several of the utilities covered this month. Not to fear, though — they’re just an apt-get away!
Most Linux users are already comfortable using top
, ps
, and free
to check out what’s running on their systems, so I won’t spend any time on those utilities. Also, this isn’t a fully comprehensive guide to all useful Linux utilities, but I hope it will serve as a jumping-off point for Linux users who are still familiarizing themselves with advanced utilities.
Processor statistics with mpstat
Let’s start off with mpstat
. This utility will provide information about your system’s processor or processors, including CPU utilization by user-level applications, system-level applications, the number of interrupts received by CPUs, and the idle time for your CPU(s) (including idle time spent waiting for disk I/O).
The syntax is pretty simple. Running mpstat
by itself (or mpstat 0
) will display the averages for your processor since system startup. The display will look something like this:
Linux 2.4.26-1-686 (serenity.zonker.net) 05/18/04 08:11:41 CPU %user
%nice %system %iowait %irq %soft %idle intr/s 08:11:41 all 0.12 0.00 0.03
0.00 0.00 0.00 99.85 101.43
It probably goes without saying that the first field is the time that mpstat
ran. Because I ran mpstat
without specifying the CPU on which I wanted statistics, it simply shows global statistics. In this case, the system in question has only one CPU anyway, so there’s no point in specifying the CPU. You can specify CPU using the mpstat -P n
option, where n
is the CPU number. Note that mpstat
starts counting at 0 instead of 1.
The %user
, %system
, %nice
, and %idle
values should already be familiar from top
. The value that is particularly interesting to most admins is %iowait
, which shows how much time the CPU(s) spend idle waiting for disk I/O. Obviously, this system isn’t terribly busy, so the disk I/O isn’t a big bottleneck here. However, mpstat
can be useful in finding out whether your CPUs are waiting on reads from disk.
If you want to see real-time statistics, this is also possible. Let’s say you want to see CPU statistics at 1 second intervals. You can run mpstat 1
and then you’ll get the same readouts, except that they will reflect the current state of the system rather than the aggregated statistics since system startup. If you only want to see readings for a limited time, say one minute, you can run mpstat 1 60
. This will update the statistics 60 times at one-second intervals.
Earlier versions of mpstat
may show different information. The version used here is 5.0.3, taken from the Debian testing repository.
Virtual memory statistics with vmstat
Next on the list is vmstat
. As the name implies, this utility reports virtual memory statistics. Running vmstat
with no options produces the following output:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 8 8040 26728 300256 0 0 51 37 9 39 1 1 97 1
There’s quite a bit of info crammed into vmstat
’s output. The first two fields describe processes waiting for runtime, and processes that are in uninterruptible sleep, respectively. The next four fields cover the amount of virtual memory in use ( swpd
), free memory, memory used as buffers, and memory being used as cache. The next two fields show how much memory is being swapped in and out of disk per second.
The next two fields (under io
) show blocks received from block devices and blocks sent to block devices. The fields under system
display interrupts per second, and context switches per second. Finally, the fields under cpu
display time running user-level code ( us
), system code, time spent idle, and time spent waiting for I/O. (Yes, there is some overlap between utilities.)
That’s just the default output, however. Using vmstat
, it’s possible to drill down a bit deeper and look at other information. For example, the -p
option allows the user to specify a partition to display detailed statistics about a given partition. vmstat -d
displays disk statistics. If you’d like a one-time display of event counters and memory stats, use vmstat -s -S M
, which produces output like this:
377 M total memory
373 M used memory
239 M active memory
105 M inactive memory
3 M free memory
20 M buffer memory
303 M swap cache
760 M total swap
0 M used swap
760 M free swap
499224 non-nice user cpu ticks
6027 nice user cpu ticks
134899 system cpu ticks
31925112 idle cpu ticks
172288 IO-wait cpu ticks
32025 IRQ cpu ticks
74892 softirq cpu ticks
16745809 pages paged in
12265466 pages paged out
0 pages swapped in
2 pages swapped out
350426106 interrupts
13250775 CPU context switches
1084569411 boot time
16563 forks
The -s
option tells vmstat
to produce the table output; the -S
option tells vmstat
to put memory statistics in megabytes.
Running vmstat -m
will give you the slabinfo output (taken from /proc/slabinfo
), which is more information about cached objects in the Linux kernel than most folks need (or want…). But it’s there if you need it. Speaking of slabinfo, the slabtop
utility can be used to produce a top
-like display of kernel slab information.
Note that some of vmstat
’s options only work with kernels newer than 2.5.70.
CPU and I/O statistics with iostat
Next up is iostat
. Like mpstat
and vmstat
, this utility displays information about your system’s CPU and I/O, though in a different format and with some different information.
The default output of iostat
is the average CPU and device utilization since system startup. Running iostat n
(where n is an interval in seconds) will produce a display of device and CPU utilization since the last report. For example, iostat 5
produces the following:
Linux 2.6.4-52-default (yggdrasil) 05/18/2004
avg-cpu: %user %nice %sys %iowait %idle
1.54 0.02 0.74 0.53 97.18
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda 2.15 101.72 75.06 33511064 24729224
hdc 0.00 0.02 0.00 7656 0
avg-cpu: %user %nice %sys %iowait %idle
5.20 0.00 2.40 0.20 92.20
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda 6.20 30.40 84.80 152 424
hdc 0.00 0.00 0.00 0 0
The first display is the average since system boot; the second is current. If you’d like to display only CPU usage, the -c
option can be used to display only the CPU report. The -d
option is used to display only disk utilization.
If you’d like slightly more readable output, you can use the -k
option to display reads in kilobytes rather than blocks. There are several other options worth looking into; be sure to check the iostat
manpage for all the options. As with the other utilities, some iostat
functionality is dependant on having a 2.5 or newer kernel.
Getting system activity information with sar
The next utility is sar
. This utility can be used to display system activity information, or to collect information for further study. The information collated/displayed by sar
is largely available using the other utilities that I’ve already discussed.
However, sar
also displays a wealth of information not available through the other utilities. Running sar -A
, for example, will display just about every relevant piece of information you’d want about your system since midnight the current day. I won’t paste the output here, as it’s quite lengthy. Give it a try on your own system, though.
If your system isn’t collecting the data, the command will fail with something like “Cannot open /var/log/sa/sa05: No such file or directory.” In this case, you’ll need to enable logging using sadc
(the System activity data collector). I believe that most Linux distros include an init script for this, though it might not be enabled by default.
Having consistent system troubles and not quite sure what’s causing them? This is where sar
shines, because it produces output that identifies system load by time. Let’s say you have a system that bogs or even fails every night, but you’re not quite sure why. Maybe you suspect that disk I/O is killing your system, so you run sar -b
, which displays the I/O and transfer rate statistics in ten-minute intervals, like this:
11:00:00 AM tps rtps wtps bread/s bwrtn/s
11:10:00 AM 0.08 0.00 0.08 0.01 1.09
11:20:00 AM 0.09 0.00 0.09 0.00 1.31
11:30:00 AM 0.13 0.00 0.13 0.00 2.06
11:40:00 AM 0.17 0.01 0.16 0.34 2.85
That’s just a small sample of the output, of course. This shows the transfers per second to disk, reads per second from disk, write requests per second (look to see whether wtps
is outstripping tps
), and blocks read and written to your drives.
Another handy use for sar
is to display network statistics. Running sar -n FULL
will produce a full report of network statistics, including errors and sockets in use.
To start at a specific time, you can use the -s
option. For example, if you only want statistics since 6 AM, run sar -s 06:00:00
.
You can also use sar
to display system load queues ( -q
), memory and swap space utilization ( -r
), and a number of other statistics about your system. It definitely pays to spend a little time reading the sar
manpage!
Visualizing system activity with isag
If all of the textual output doesn’t do much for you, there’s isag
. This little utility produces visual graphs of system activity that makes it easy to visualize what’s going on with your system. It also utilizes data produced by sar
or sadc
. It expects to find files under /var/log/sa
in the format sann
, where nn
is the day of the month. On my system, the files are in the format sa.YYYY_MM_DD
instead, so I have to use isag -m sa.2004_[0-9][0-9]_[0-9][0-9]
to let isag
know what “mask” to expect for the datafiles.
Once you start isag
, you can choose the datafile you want to display and then the type of chart you want to see. There are 10 different charts displayed by isag
, including I/O transfer rate, CPU utiliztion, inode status, and paging statistics. Need to produce visual proof that a system is too heavily loaded to get budget approval to buy a new one? Show your boss and the beancounters a nice colorful graph produced by isag
that shows the system is spending too much time swapping data to disk.
Figure 1: isag’s output
Figure 1 is an example of isag
’s output, displaying memory and swap used.
As you can see, it’s much easier to grok what’s going on when you produce an easy-to-read graph of the data. Using isag
, it’s also easy for me to switch between daily graphs to compare daily averages as well as hourly ones. It’s a much faster way to pick out trends in system usage from the data collected by sar
and sadc
.
There’s plenty of information available for system troubleshooting if you know where to find it. With these tools at your disposal, you’ll be able to learn a lot more about your system and its performance.
(Originally written for and published on UnixReview.com, which is now defunct. Revived from Archive.org b/c it still seems useful.)