P For Paranoia OR a quick way of overwriting a partition with random-like data

(General Surgeon’s warning: The following post contains doses of paranoia which might exceed your recommended daily dosage. Fnord!).

A lot of the data sanitisation literature around advises overwriting partitions with random data (btw, SANS Institute research claims that even a pass with /dev/zero is enough to stop MFM but YPMV). So leaving Guttman-like techniques aside, in practice, generation of random data will take a long time in your average system which does not contain a cryptographic accelerator. In order to speed up things, /dev/urandom can be used in lieu of /dev/random, noting that when read, the non-blocking /dev/urandom device will return as many bytes as are requested, even if the entropy pool is depleted . As a result, the result stream is not as cryptographically sound as /dev/random but is faster.

Assuming that time is of the essence and your paranoia level is low there is an alternative which you can use, both providing random-like data (which means you do not have to fall back to /dev/zero and keep fingers crossed) and being significantly faster. Enter Truecrypt. Truecrypt allows for encrypted partitions using a variety of algorithms that have been submitted to peer review and are deemed secure for general usage. I can hear Johnny sceptical shouting “Hey wait a minute now, this is NOT random data, what the heck are you talking about?”. First of all, Truecrypt headers aside, let’s see what ent reports. For those of you not familiar with ent, it is a tool that performs a statistical analysis of a given file (or bitstream if you tell it so), giving you an idea about entropy and other way way useful statistics. For more information man 1 ent.

For the purposes of this demonstration, I have created the following files:

  • an AES encrypted container
  • an equivalent size file getting data from /dev/urandom (I know, but I was in a hurry )
  • a well defined binary object in the form of a shared library
  • a system configuration file
  • a seed file which contains a mixture of English, Chinese literature, some C code, strings(1) output from the non-encrypted swap (wink-wink, nudge-nudge)
  • Let’s do some ent analysis and see what results we get (for the hastily un-strict compliant Perl code look at the end of the article)

    ################################################################################
    processing file: P_for_Paranoia.tc 16777216 bytes
    Entropy = 7.999988 bits per byte.

    Optimum compression would reduce the size
    of this 16777216 byte file by 0 percent.

    Chi square distribution for 16777216 samples is 288.04, and randomly
    would exceed this value 10.00 percent of the times.

    Arithmetic mean value of data bytes is 127.4834 (127.5 = random).
    Monte Carlo value for Pi is 3.141790185 (error 0.01 percent).
    Serial correlation coefficient is 0.000414 (totally uncorrelated = 0.0).

    ################################################################################
    processing file: P_for_Paranoia.ur 16777216 bytes
    Entropy = 7.999989 bits per byte.

    Optimum compression would reduce the size
    of this 16777216 byte file by 0 percent.

    Chi square distribution for 16777216 samples is 244.56, and randomly
    would exceed this value 50.00 percent of the times.

    Arithmetic mean value of data bytes is 127.4896 (127.5 = random).
    Monte Carlo value for Pi is 3.143757139 (error 0.07 percent).
    Serial correlation coefficient is -0.000063 (totally uncorrelated = 0.0).

    ################################################################################
    processing file: seed 16671329 bytes
    Entropy = 5.751438 bits per byte.

    Optimum compression would reduce the size
    of this 16671329 byte file by 28 percent.

    Chi square distribution for 16671329 samples is 101326138.53, and randomly
    would exceed this value 0.01 percent of the times.

    Arithmetic mean value of data bytes is 82.9071 (127.5 = random).
    Monte Carlo value for Pi is 3.969926804 (error 26.37 percent).
    Serial correlation coefficient is 0.349229 (totally uncorrelated = 0.0).

    ################################################################################
    processing file: /etc/passwd 1854 bytes
    Entropy = 4.898835 bits per byte.

    Optimum compression would reduce the size
    of this 1854 byte file by 38 percent.

    Chi square distribution for 1854 samples is 20243.47, and randomly
    would exceed this value 0.01 percent of the times.

    Arithmetic mean value of data bytes is 86.1019 (127.5 = random).
    Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
    Serial correlation coefficient is 0.181177 (totally uncorrelated = 0.0).

    ################################################################################
    processing file: /usr/lib/firefox-4.0.1/libxul.so 31852744 bytes
    Entropy = 5.666035 bits per byte

    Optimum compression would reduce the size
    of this 31852744 byte file by 29 percent.

    Chi square distribution for 31852744 samples is 899704400.21, and randomly
    would exceed this value 0.01 percent of the times.

    Arithmetic mean value of data bytes is 74.9209 (127.5 = random).
    Monte Carlo value for Pi is 3.563090648 (error 13.42 percent).
    Serial correlation coefficient is 0.391466 (totally uncorrelated = 0.0).

    Focusing on entropy, we see that
    Truecrypt: Entropy = 7.999988 bits per byte.
    /dev/urandom: Entropy = 7.999989 bits per byte.

    which are directly comparable (if you are trusting ent that is) and much better than a well structured binary file (5.666035 bits per byte) and heads and shoulders our seed.txt results (which is a conglomerate unlikely to be encountered in practice). Chi-square entropy distribution values are different by a factor of 5 in our example, in favor of /dev/urandom data, which is still way more than the data encountered in our other test cases.

    From the above, there is strong indication that when you need random-like data and /dev/urandom is too slow (for example, as I will elaborate on an upcoming post), for example when you want to “randomize” your swap area, a Truecrypt volume will do in a pinch.

    #!/usr/bin/env perl
    use warnings;
    use File::stat;
    # a 5 min script (AKA no strict compliance) to supplement results for a blog article
    # why perl? Nostalgia :-)

    @subjects = qw(P_for_Paranoia.tc P_for_Paranoia.ur seed /etc/passwd /usr/lib/firefox-4.0.1/libxul.so);
    sub analyzeEnt {
    my($file) = @_;
    my $sz = stat($file)->size;
    my $ent = `ent $file` ."\n";
    print "#" x 80 . "\nprocessing file: $file ". $sz ." bytes\n".$ent;
    }
    foreach my $subject (@subjects) {
    &analyzeEnt($subject);
    }