This month, I’ll present a few system tools that can be helpful when trying to diagnose your Linux system’s health, improve performance, and so on. This installment is intended for users who are newer to Linux, and who might not be familiar with, or aware of all the utilities that are already available at their fingertips.

I often feature tools that are not included by default on a Linux system in the Tool of the Month column, but this installment of Open Road will present some utilities that are part of a “standard” Linux install, or at least packaged and available for most Linux distros — whether they’re actually installed by default, or not. I’m specifically thinking of Debian here, since a minimal Debian install won’t include several of the utilities covered this month. Not to fear, though — they’re just an apt-get away!

Most Linux users are already comfortable using top, ps, and free to check out what’s running on their systems, so I won’t spend any time on those utilities. Also, this isn’t a fully comprehensive guide to all useful Linux utilities, but I hope it will serve as a jumping-off point for Linux users who are still familiarizing themselves with advanced utilities.

Processor statistics with mpstat

Let’s start off with mpstat. This utility will provide information about your system’s processor or processors, including CPU utilization by user-level applications, system-level applications, the number of interrupts received by CPUs, and the idle time for your CPU(s) (including idle time spent waiting for disk I/O).

The syntax is pretty simple. Running mpstat by itself (or mpstat 0) will display the averages for your processor since system startup. The display will look something like this:

Linux 2.4.26-1-686 (serenity.zonker.net) 05/18/04 08:11:41 CPU %user
        %nice %system %iowait %irq %soft %idle intr/s 08:11:41 all 0.12 0.00 0.03
        0.00 0.00 0.00 99.85 101.43

It probably goes without saying that the first field is the time that mpstat ran. Because I ran mpstat without specifying the CPU on which I wanted statistics, it simply shows global statistics. In this case, the system in question has only one CPU anyway, so there’s no point in specifying the CPU. You can specify CPU using the mpstat -P n option, where n is the CPU number. Note that mpstat starts counting at 0 instead of 1.

The %user, %system, %nice, and %idle values should already be familiar from top. The value that is particularly interesting to most admins is %iowait, which shows how much time the CPU(s) spend idle waiting for disk I/O. Obviously, this system isn’t terribly busy, so the disk I/O isn’t a big bottleneck here. However, mpstat can be useful in finding out whether your CPUs are waiting on reads from disk.

If you want to see real-time statistics, this is also possible. Let’s say you want to see CPU statistics at 1 second intervals. You can run mpstat 1 and then you’ll get the same readouts, except that they will reflect the current state of the system rather than the aggregated statistics since system startup. If you only want to see readings for a limited time, say one minute, you can run mpstat 1 60. This will update the statistics 60 times at one-second intervals.

Earlier versions of mpstat may show different information. The version used here is 5.0.3, taken from the Debian testing repository.

Virtual memory statistics with vmstat

Next on the list is vmstat. As the name implies, this utility reports virtual memory statistics. Running vmstat with no options produces the following output:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
0  0      8   8040  26728 300256    0    0    51    37    9    39  1  1 97  1

There’s quite a bit of info crammed into vmstat’s output. The first two fields describe processes waiting for runtime, and processes that are in uninterruptible sleep, respectively. The next four fields cover the amount of virtual memory in use ( swpd), free memory, memory used as buffers, and memory being used as cache. The next two fields show how much memory is being swapped in and out of disk per second.

The next two fields (under io) show blocks received from block devices and blocks sent to block devices. The fields under system display interrupts per second, and context switches per second. Finally, the fields under cpu display time running user-level code ( us), system code, time spent idle, and time spent waiting for I/O. (Yes, there is some overlap between utilities.)

That’s just the default output, however. Using vmstat, it’s possible to drill down a bit deeper and look at other information. For example, the -p option allows the user to specify a partition to display detailed statistics about a given partition. vmstat -d displays disk statistics. If you’d like a one-time display of event counters and memory stats, use vmstat -s -S M, which produces output like this:

          377 M total memory
          373 M used memory
          239 M active memory
          105 M inactive memory
            3 M free memory
           20 M buffer memory
          303 M swap cache
          760 M total swap
            0 M used swap
          760 M free swap
       499224 non-nice user cpu ticks
         6027 nice user cpu ticks
       134899 system cpu ticks
     31925112 idle cpu ticks
       172288 IO-wait cpu ticks
        32025 IRQ cpu ticks
        74892 softirq cpu ticks
     16745809 pages paged in
     12265466 pages paged out
            0 pages swapped in
            2 pages swapped out
    350426106 interrupts
     13250775 CPU context switches
   1084569411 boot time
        16563 forks

The -s option tells vmstat to produce the table output; the -S option tells vmstat to put memory statistics in megabytes.

Running vmstat -m will give you the slabinfo output (taken from /proc/slabinfo), which is more information about cached objects in the Linux kernel than most folks need (or want…). But it’s there if you need it. Speaking of slabinfo, the slabtop utility can be used to produce a top-like display of kernel slab information.

Note that some of vmstat’s options only work with kernels newer than 2.5.70.

CPU and I/O statistics with iostat

Next up is iostat. Like mpstat and vmstat, this utility displays information about your system’s CPU and I/O, though in a different format and with some different information.

The default output of iostat is the average CPU and device utilization since system startup. Running iostat n (where n is an interval in seconds) will produce a display of device and CPU utilization since the last report. For example, iostat 5 produces the following:

Linux 2.6.4-52-default (yggdrasil) 	05/18/2004

avg-cpu:  %user   %nice    %sys %iowait   %idle
           1.54    0.02    0.74    0.53   97.18

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
hda               2.15       101.72        75.06   33511064   24729224
hdc               0.00         0.02         0.00       7656          0

avg-cpu:  %user   %nice    %sys %iowait   %idle
           5.20    0.00    2.40    0.20   92.20

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
hda               6.20        30.40        84.80        152        424
hdc               0.00         0.00         0.00          0          0

The first display is the average since system boot; the second is current. If you’d like to display only CPU usage, the -c option can be used to display only the CPU report. The -d option is used to display only disk utilization.

If you’d like slightly more readable output, you can use the -k option to display reads in kilobytes rather than blocks. There are several other options worth looking into; be sure to check the iostat manpage for all the options. As with the other utilities, some iostat functionality is dependant on having a 2.5 or newer kernel.

Getting system activity information with sar

The next utility is sar. This utility can be used to display system activity information, or to collect information for further study. The information collated/displayed by sar is largely available using the other utilities that I’ve already discussed.

However, sar also displays a wealth of information not available through the other utilities. Running sar -A, for example, will display just about every relevant piece of information you’d want about your system since midnight the current day. I won’t paste the output here, as it’s quite lengthy. Give it a try on your own system, though.

If your system isn’t collecting the data, the command will fail with something like “Cannot open /var/log/sa/sa05: No such file or directory.” In this case, you’ll need to enable logging using sadc (the System activity data collector). I believe that most Linux distros include an init script for this, though it might not be enabled by default.

Having consistent system troubles and not quite sure what’s causing them? This is where sar shines, because it produces output that identifies system load by time. Let’s say you have a system that bogs or even fails every night, but you’re not quite sure why. Maybe you suspect that disk I/O is killing your system, so you run sar -b, which displays the I/O and transfer rate statistics in ten-minute intervals, like this:

11:00:00 AM       tps      rtps      wtps   bread/s   bwrtn/s
11:10:00 AM      0.08      0.00      0.08      0.01      1.09
11:20:00 AM      0.09      0.00      0.09      0.00      1.31
11:30:00 AM      0.13      0.00      0.13      0.00      2.06
11:40:00 AM      0.17      0.01      0.16      0.34      2.85

That’s just a small sample of the output, of course. This shows the transfers per second to disk, reads per second from disk, write requests per second (look to see whether wtps is outstripping tps), and blocks read and written to your drives.

Another handy use for sar is to display network statistics. Running sar -n FULL will produce a full report of network statistics, including errors and sockets in use.

To start at a specific time, you can use the -s option. For example, if you only want statistics since 6 AM, run sar -s 06:00:00.

You can also use sar to display system load queues ( -q), memory and swap space utilization ( -r), and a number of other statistics about your system. It definitely pays to spend a little time reading the sar manpage!

Visualizing system activity with isag

If all of the textual output doesn’t do much for you, there’s isag. This little utility produces visual graphs of system activity that makes it easy to visualize what’s going on with your system. It also utilizes data produced by sar or sadc. It expects to find files under /var/log/sa in the format sann, where nn is the day of the month. On my system, the files are in the format sa.YYYY_MM_DD instead, so I have to use isag -m sa.2004_[0-9][0-9]_[0-9][0-9] to let isag know what “mask” to expect for the datafiles.

Once you start isag, you can choose the datafile you want to display and then the type of chart you want to see. There are 10 different charts displayed by isag, including I/O transfer rate, CPU utiliztion, inode status, and paging statistics. Need to produce visual proof that a system is too heavily loaded to get budget approval to buy a new one? Show your boss and the beancounters a nice colorful graph produced by isag that shows the system is spending too much time swapping data to disk. Figure 1: isag’s output Figure 1 is an example of isag’s output, displaying memory and swap used.

As you can see, it’s much easier to grok what’s going on when you produce an easy-to-read graph of the data. Using isag, it’s also easy for me to switch between daily graphs to compare daily averages as well as hourly ones. It’s a much faster way to pick out trends in system usage from the data collected by sar and sadc.

There’s plenty of information available for system troubleshooting if you know where to find it. With these tools at your disposal, you’ll be able to learn a lot more about your system and its performance.

(Originally written for and published on UnixReview.com, which is now defunct. Revived from Archive.org b/c it still seems useful.)