user14070
user14070

Reputation:

Generating a random binary file

Why did it take 5 minutes to generate a 1 KiB file on my (low-end laptop) system with little load? And how could I generate a random binary file faster?

$ time dd if=/dev/random of=random-file bs=1 count=1024
1024+0 records in
1024+0 records out
1024 bytes (1.0 kB) copied, 303.266 s, 0.0 kB/s

real    5m3.282s
user    0m0.000s
sys 0m0.004s
$ 

Notice that dd if=/dev/random of=random-file bs=1024 count=1 doesn't work. It generates a random binary file of random length, on most runs under 50 B. Has anyone an explanation for this too?

Upvotes: 28

Views: 33342

Answers (5)

Lance Rushing
Lance Rushing

Reputation: 7640

Try /dev/urandom instead:

$ time dd if=/dev/urandom of=random-file bs=1 count=1024

From: http://stupefydeveloper.blogspot.com/2007/12/random-vs-urandom.html

The main difference between random and urandom is how they are pulling random data from kernel. random always takes data from entropy pool. If the pool is empty, random will block the operation until the pool would be filled enough. urandom will genarate data using SHA(or any other algorithm, MD5 sometimes) algorithm in the case kernel entropy pool is empty. urandom will never block the operation.

Upvotes: 16

Mark
Mark

Reputation: 401

Old thread, but like Tobbe mentioned, I needed something like this only better (faster).

So... a shell way of doing it the same, just way quicker then random/urandom, useful when creating really big files, I admit not fully random, but close enough probably, depends on your needs.

dd if=/dev/mem of=test1G.bin bs=1M count=1024
touch test100G.bin
seq 1 100 | xargs -Inone cat test1G.bin >> test100G.bin

This will create a 100Gb file from the contents of your ram (the first 1GB, I assume you have so much ram :) ) Note that it's also probably unsafe to share this file since it may contain all kinds of sensitive data like your passwords, so use it only for your own causes :) Oh, and you need to run it as root for the very same reason.

Upvotes: 3

Tobbe
Tobbe

Reputation: 149

Old thread but i just needed the same thing. Old friend C came to rescue since i don't want to mess around with scripts. Here is my solution which is good and quick enough for me:

// usage: ./program <outfile> <size-in-bytes>
#include <stdio.h>
void main(int argc, char** argv){
    long long i, s;
    FILE* f = fopen(*(argv+1), "w");
    srand(time(NULL));
    sscanf(*(argv+2), "%lld", &s);  
    for(i=0;i<s;i++){
        fputc(rand()%255,f);
    }
    fclose(f);
}

Upvotes: -1

Bruno Bronosky
Bruno Bronosky

Reputation: 70519

I wrote a script to test various hashing functions speeds. For this I wanted files of "random" data, and I didn't want to use the same file twice so that none of the functions had a kernel cache advantage over the other. I found that both /dev/random and /dev/urandom were painfully slow. I chose to use dd to copy data of my hard disk starting at random offsets. I would NEVER suggest using this if you are doing anythings security related, but if all you need is noise it doesn't matter where you get it. On a Mac use something like /dev/disk0 on Linux use /dev/sda

Here is the complete test script:

tests=3
kilobytes=102400
commands=(md5 shasum)
count=0
test_num=0
time_file=/tmp/time.out
file_base=/tmp/rand

while [[ test_num -lt tests ]]; do
    ((test_num++))
    for cmd in "${commands[@]}"; do
        ((count++))
        file=$file_base$count
        touch $file
        # slowest
        #/usr/bin/time dd if=/dev/random of=$file bs=1024 count=$kilobytes >/dev/null 2>$time_file
        # slow
        #/usr/bin/time dd if=/dev/urandom of=$file bs=1024 count=$kilobytes >/dev/null 2>$time_file                                                                                                        
        # less slow
        /usr/bin/time sudo dd if=/dev/disk0 skip=$(($RANDOM*4096)) of=$file bs=1024 count=$kilobytes >/dev/null 2>$time_file
        echo "dd took $(tail -n1 $time_file | awk '{print $1}') seconds"
        echo -n "$(printf "%7s" $cmd)ing $file: "
        /usr/bin/time $cmd $file >/dev/null
        rm $file
    done
done

Here is the "less slow" /dev/disk0 results:

dd took 6.49 seconds
    md5ing /tmp/rand1:         0.45 real         0.29 user         0.15 sys
dd took 7.42 seconds
 shasuming /tmp/rand2:         0.93 real         0.48 user         0.10 sys
dd took 6.82 seconds
    md5ing /tmp/rand3:         0.45 real         0.29 user         0.15 sys
dd took 7.05 seconds
 shasuming /tmp/rand4:         0.93 real         0.48 user         0.10 sys
dd took 6.53 seconds
    md5ing /tmp/rand5:         0.45 real         0.29 user         0.15 sys
dd took 7.70 seconds
 shasuming /tmp/rand6:         0.92 real         0.49 user         0.10 sys

Here are the "slow" /dev/urandom results:

dd took 12.80 seconds
    md5ing /tmp/rand1:         0.45 real         0.29 user         0.15 sys
dd took 13.00 seconds
 shasuming /tmp/rand2:         0.58 real         0.48 user         0.09 sys
dd took 12.86 seconds
    md5ing /tmp/rand3:         0.45 real         0.29 user         0.15 sys
dd took 13.18 seconds
 shasuming /tmp/rand4:         0.59 real         0.48 user         0.10 sys
dd took 12.87 seconds
    md5ing /tmp/rand5:         0.45 real         0.29 user         0.15 sys
dd took 13.47 seconds
 shasuming /tmp/rand6:         0.58 real         0.48 user         0.09 sys

Here is are the "slowest" /dev/random results:

dd took 13.07 seconds
    md5ing /tmp/rand1:         0.47 real         0.29 user         0.15 sys
dd took 13.03 seconds
 shasuming /tmp/rand2:         0.70 real         0.49 user         0.10 sys
dd took 13.12 seconds
    md5ing /tmp/rand3:         0.47 real         0.29 user         0.15 sys
dd took 13.19 seconds
 shasuming /tmp/rand4:         0.59 real         0.48 user         0.10 sys
dd took 12.96 seconds
    md5ing /tmp/rand5:         0.45 real         0.29 user         0.15 sys
dd took 12.84 seconds
 shasuming /tmp/rand6:         0.59 real         0.48 user         0.09 sys

You'll notice that /dev/random and /dev/urandom were not much different in speed. However, /dev/disk0 took 1/2 the time.

PS. I lessen the number of tests and removed all but 2 commands for the sake of "brevity" (not that I succeeded in being brief).

Upvotes: 3

Stephan202
Stephan202

Reputation: 61589

That's because on most systems /dev/random uses random data from the environment, such as static from peripheral devices. The pool of truly random data (entropy) which it uses is very limited. Until more data is available, output blocks.

Retry your test with /dev/urandom (notice the u), and you'll see a significant speedup.

See Wikipedia for more info. /dev/random does not always output truly random data, but clearly on your system it does.

Example with /dev/urandom:

$ time dd if=/dev/urandom of=/dev/null bs=1 count=1024
1024+0 records in
1024+0 records out
1024 bytes (1.0 kB) copied, 0.00675739 s, 152 kB/s

real    0m0.011s
user    0m0.000s
sys 0m0.012s

Upvotes: 38

Related Questions