Reputation: 5807
The following code makes a list of names and 'numbers' and gives each person a random age between 15 and 90.
#!/bin/sh
file=$1
n=$2
# if number is zero exit
if [ "$n" -eq "0" ]
then
exit 0
fi
echo "Generating list of $n people."
for i in `seq 1 $n`;
do
let "NUM=($RANDOM%75)+15"
echo "name$i $NUM (###)###-####" >> $file
done
echo "List generated."
With it I'm attempting to make a list of 1M names. It's slow, I expected that; it was so slow tho that I lost patience and tried 10K names. That was slow too, but it got done in a few seconds.
The reason I'm generating the names is to sort them. What surprised me is that when I sorted the list of 10K names it was instant.
How can this be?
Is there something that's making this go ungodly slow? Both the sorting and the generating are accessing files so how can the sorting be faster? Is my random number math in the list generator what's slowing it down?
Here's my sorting script.
#!/bin/sh
#first argument is list to be sorted, second is output file
tr -s '' < $1 | sort -n -k2 > $2
Upvotes: 2
Views: 1921
Reputation: 16545
Not a new answer, just new code.
This is what IMHO is a good middle way between nice and efficient code (as efficient as you can be in Bash, it IS slow, it's a shell...)
for ((i=1;i<=n;i++));
do
echo "name$i $((NUM=(RANDOM%75)+15)) (###)###-####"
done > "$file"
Alternative, not using a classic counter loop
i=1
while ((i<=n)); do
echo "name$((i++)) $((NUM=(RANDOM%75)+15)) (###)###-####"
done > "$file"
Both are about the same speed.
The fixes are the same as mentioned by all the others:
Upvotes: 6
Reputation: 48212
Using the shell to generate random numbers like this isn't really what it was designed to do. You'll likely be better off coding something to generate random numbers from a uniform distribution in another language, like Fortran, Perl or C.
In your code, one thing that's going to be very slow is generating a sequence of numbers from 1..1e7 and assigning them all to a variable. That's likely very wasteful, but you should profile if you want to be sure. As chaos points out, appending to the file is also likely to be very costly!
In Python, you can do something like this:
#!/usr/bin/python
import random
count = 1
print ' '.join( ['name', 'age'] )
while count <= 1000000:
age = random.randrange(15,90)
count = count + 1
name = 'name' + str(count)
print ' '.join( [ name, str(age) ] )
Running that on my laptop takes ~10 seconds. Assigning the seq from 1 to 1000000 takes ~10 seconds, when you add the random number generation your script takes over three minutes on the same machine. I got frustrated just as you did, and played around with the script to try and make it faster. Here's my shortened version of your code that I'm playing with:
for x in `seq 1 10000`; do
let "NUM=($RANDOM%75)+15"
echo $NUM >> test.txt
done
Running this takes about 5.3s:
$ time ./test.sh
real 0m5.318s
user 0m1.305s
sys 0m0.675s
Removing the file appending and simply redirecting STDOUT to a single file gives the following script:
for x in `seq 1 10000`; do
let "NUM=($RANDOM%75)+15"
echo $NUM
done
Running this takes about half a second:
$ time ./test.sh > test.txt
real 0m0.516s
user 0m0.449s
sys 0m0.067s
The slowness of your program is at least partly due to appending to that file. Curiously, when I tried to swap the seq call with a for loop, I didn't notice any speedup.
Upvotes: 5
Reputation: 361977
for i in `seq 1 $n`
Yikes! This is generating 1,000,000 arguments to the for
loop. That seq
call will take a long, long, long time. Try
for ((i = 1; i <= n; i++))
Notice the lack of dollar signs, by the way. Peculiarly, the var++
syntax requires you to omit the dollar sign from the variable name. You are also allowed to use or to omit them elsewhere: it could be i <= n
or $i <= $n
, either one. The way I figure, you should omit dollar signs entirely in let
, declare
, and for ((x; y; z))
statements. See the ARITHMETIC EVALUATION section of the sh
man page for the full explanation.
Upvotes: 5
Reputation: 13955
Try this for your main loop:
seq 1 $n | while read i
do
let "NUM=($RANDOM%75)+15"
echo "name$i $NUM (###)###-####"
done > $file
This will make the seq
and the loop work in parallel instead of waiting for the seq to finish before starting the loop. This will be faster on multiple cores/CPUs but slightly slower on a single core.
And I agree with the others here: Does it have to be bash?
Edit: add chaos' suggestion to keep the file open, not open for append for each name.
Upvotes: 2
Reputation:
(I have a feeling you may not like this answer, but you technically didn't specify the answer had to remain in bash! :P)
It's common to rapidly develop something in prototyping language, and then possibly switch to another language (often C) as needed. Here's a very similar program in Python for you to compare:
#!/usr/bin/python
import sys
import random
def main(args=None):
args = args or []
if len(args) == 1:
# default first parameter
args = ["-"] + args
if len(args) != 2:
sys.stderr.write("error: invalid parameters\n")
return 1
n = int(args[1])
output = sys.stdout if args[0] == "-" else open(args[0], "a")
for i in xrange(1, n + 1):
num = random.randint(0, 74)
output.write("name%s %s (###)###-####\n" % (i, num))
sys.stderr.write("List generated.\n") # see note below
if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
Note: Only using stdout for "real output" instead of status notifications allows this program to be run in parallel with others, piping data directly from stdout of one to stdin of another. (It's possible with special files in *nix, but just easier if you can use stdout.) Example:
$./rand_names.py 1000000 | sort -n -k2 > output_file
And it should be fast enough:
$time ./rand_names.py 1000000 > /dev/null List generated. real 0m16.393s user 0m15.108s sys 0m0.171s
Upvotes: 2
Reputation: 17307
I guess the '>> $file' can be the source of your problem. On my system your script takes 10 seconds to generate 10000. If I remove the $file argument and instead just use stdout and capture the whole thing to a file it takes under a second.
$ time ./gen1.sh n1.txt 10000 Generating list of 10000 people. List generated.
real 0m7.552s user 0m1.355s sys 0m1.886s
$ time ./gen2.sh 10000 > n2.txt
real 0m0.806s user 0m0.576s sys 0m0.140s
Upvotes: 4
Reputation: 124335
Don't know if it's the whole story, but re-opening the file to append to it for every name can't be helping anything. Doing the whole thing in any context where you can keep an open file handle to write to should help a lot.
Upvotes: 3