Saturday, August 24, 2013

Troubleshooting & Solving Linux disk space problems

In this blogpost I want to explain how you can troubleshoot diskspace problems.

Find out which files are using all the disk space.

Graphical desktop

When you have a graphical desktop at hand you should really consider installing Baobab (also known as disk usage analyzer for Ubuntu users). This application is rather straightforward.

You can click on 'Scan home' to verify the disk usage in your home directory. You can also click the button with the Harddisk symbol to start the analysis at your root directory. The graph on the right shows the usage per folder, this is handy to quickly identify greedy disk space consumers.

Command line

When you don't have a graphical desktop at hand, you must help yourself with the df and du commands.

df

The df tool will show you a table that reports the file system disk space usage. It takes a -h flag to write the output in human readable format. A table (constructed by df -h) looks like the following:

FilesystemSizeUsedAvailUse%Mounted
/dev/sda528G12G16G43%/
udev3.9G4.0K3.9G1%/dev
tmpfs1.6G1.1M1.6G1%/run
none5.0M05.0M0%/run/lock
none3.9G520K3.9G1%/run/shm
/dev/sda7193G135G48G74%/home
/dev/sda8121G120G1G99%/tmp

As you can see it is a handy tool to quickly see the state of your filesystems. A good first indication to see if problems are at hand.

The du tool will estimate the disk space usage. It takes 2 very handy flags which I cannot live without anymore.

  • -h -> Human readable
  • --max-depth=N -> Make a summary of directories up till the Nth level

The human readable flag will convert the numerical output from bytes (B) to more readable units like KB, MB, GB. This is the same behavior as with the df command.

Let's illustrated the du command with an example::

ow@D ~/Desktop $ du -h --max-depth=1 /tmp
du: cannot read directory `/tmp/vmware-root': Permission denied
4.0K /tmp/vmware-root
8.0K /tmp/pulse-O0P9w3N0W4t6
120G /tmp/img
8.0K /tmp/mintUpdate
du: cannot read directory `/tmp/cvcd': Permission denied
4.0K /tmp/cvcd
4.0K /tmp/ssh-fyPBrSQD2054
4.0K /tmp/orbit-peter
8.0K /tmp/sni-qt_skype_2251-QXMSXq
4.0K /tmp/.ICE-unix
4.0K /tmp/VMwareDnD
8.0K /tmp/matecorba-peter
4.0K /tmp/.X11-unix
4.0K /tmp/.com.google.Chrome.XybSSu
4.0K /tmp/hsperfdata_peter
4.0K /tmp/jna
4.0K /tmp/keyring-GJe9xq
du: cannot read directory `/tmp/pulse-PKdhtXMmr18n': Permission denied
4.0K /tmp/pulse-PKdhtXMmr18n
164K /tmp

This example shows some typical output lines.

Each line represents a directory and the first column shows the size of the content of the directory.

The last line is a summary line and shows the total size of the content of /tmp (or at least an estimate of it taking into account the files/directories that the user can read).

The lines ending with permission denied illustrated that the du tool couldn't descend in these directories. This is because the user that launched the du command didn't have the rights to descend into that directory.

This usage can be used to track down the big disk space consumers. I prefer to start examining the disk usage with max-depth=1 from the root directory and go down the directories to find the files that consume too much disk space. E.g. Using df I notice that my filesystem mounted at /tmp is reaching it's limit (it has only 121GB of disk space and it consumes more then 120GB). Therefore I launch the du command from above and I get the output from the example above. Then I know that the disk space is used inside /tmp/img so I could launch du --max-depth=1 -h /tmp/img to drill down further. These steps can be repeated until you find the consuming files.

Mismatch between du & df.

There can be a mismatch between the outputs of du and df. This is because df really shows the diskspace that is free and du calculates the disk usage based on the file attributes. It can however be that a file you have deleted isn't really gone yet and thus takes up space. Eventhough it isn't completely gone, it doesn't show up in the directory listing and thus du won't take it into account.

An example of a situation where such a mismatch might occur is when you have an application that logs to a file. When you delete the file it can be that the application still had a handle to the file. The file won't show up in the directory listing anymore but the filespace is not freed up until the file handle is released. If this was a log file from an application that has high availability restrictions, this can be problematic since you cannot shut it down whenever you want. It is possible to force Linux to give away the space but it involves actions with /proc. This is something I might post about on a later time.