Reputation: 6189
I want to shuffle the lines of a file with a fixed seed so that I always get the same random order. The command I am using is as follows:
sort -R file.txt | head -200 > file.sff
What change could I make it so that it sorts with a fixed random seed?
Upvotes: 27
Views: 15271
Reputation: 321
Linux's shuf
command can take a file as a fixed source of randomness using the parameter --random-source
:
shuf --random-source=some_file.txt file.txt | head -n200 > file.sff
If you don't want to bother with giving a full file, you can pipe one on the go:
shuf --random-source=<(yes 42) file.txt | head -n200 > file.sff
Upvotes: 14
Reputation: 46846
You may not need to use external tools like sort
, whose options and usage may vary depending on your operating system. Bash has an internal random number generator accessible through the $RANDOM
variable. It's common practice to seed the generator by setting the variable, like so:
RANDOM=$$
or
RANDOM=$(date '+%s')
etc. But of course, you can also use a predictable seed in order to get predictable not-so-random results:
$ RANDOM=12345; echo $RANDOM
28207
$ RANDOM=12345; echo $RANDOM
28207
To reorder the lines of the mapped file randomly, you can read the file into an array using mapfile:
$ mapfile -t a < source.txt
Then simply rewrite the array indices:
$ for i in ${!a[@]}; do a[$((RANDOM+${#a[@]}))]="${a[$i]}"; unset a[$i]; done
When reading a non-associative array, bash naturally orders elements in ascending order of index value.
Note that the new index for each line has the number of array elements added to it to avoid collisions within that range. This solution is still fallible -- there's no guarantee that $RANDOM
will produce unique numbers. You can mitigate that risk with extra code that checks for prior use of each index, or reduce the risk with bit-shifting:
... a[$(( (RANDOM<<15)+RANDOM+${#a[@]} ))]= ...
This makes your index values into a 30-bit unsigned int instead of a 15 bit unsigned int.
Upvotes: 2
Reputation: 295373
The GNU implementation of sort
has a --random-source
argument. Passing this argument with the name of a file with known contents will result in a reliable set of output.
See the Random sources documentation in the GNU coreutils manual, which contains the following sample implementation and example:
get_seeded_random() { seed="$1" openssl enc -aes-256-ctr -pass pass:"$seed" -nosalt \ </dev/zero 2>/dev/null } shuf -i1-100 --random-source=<(get_seeded_random 42)
Since GNU sort
is also part of coreutils, the relevant documentation applies there as well:
sort --random-source=<(get_seeded_random 42) -R file.txt | head -200 > file.sff
Upvotes: 30
Reputation: 107040
If you're randomly shuffling lines, you're not sorting. I haven't seen a sort
with --random-source
prompt before. It'd be interesting if it does exist. However, that's not sorting the lines in a fixed order.
I believe you'll have to write a program to that, and I don't think Bash can quite do it.
Actually, it might. The $RANDOM environment variable selects a random number from 0 to 32767. You can assign a seed to RANDOM
and the random number sequence will appear over and over. You can use a card dealing algorithm. Read in each line into a Bash array, then use the card dealing algorithm to pick each line.
I'm not going to write a test program -- especially in Bash, but you should get the idea.
Upvotes: -6