Reputation: 841
Is there an easy way of shuffling randomly a fixed-size of byte chunks?
I have a large binary file (say, a hundreds of gigabytes) containing many fixed-size of bytes. I do not care about the randomness, but want to shuffle two-byte (or could be any fixed-size of bytes, up to 8) elements in the binary file. Is there a way of combining unix core tools to achieve this goal? If there is no such tool, I might have to develop a C code. I want to hear what recommendation people have.
Upvotes: 1
Views: 1325
Reputation: 36402
Here's a stupid shell trick to do so.
xxd
shuf
eg.
xxd -p -c 2 input_file | shuf - | xxd -p -r - output_file
I haven't tested it on huge files. You may want to use an intermediary file.
Alternately, you could use sort -R
like so:
xxd -c 2 in_file |sort -R | cut -d' ' -f 2 | xxd -r -p - out_file
This depends on xxd
outputing offsets, which should sort differently for each line.
Upvotes: 4
Reputation: 2019
Try:
split -b $CHUNK_SIZE $FILE && find . -name "x*" | perl -MList::Util='shuffle' -e "print shuffle<>" | xargs cat > temp.bin
This creates a large amount of files each with a file size of $CHUNK_SIZE
(or less, if the total file size doesn't divide by $CHUNK_SIZE
), named xaa
, xab
, xac
, etc., lists the files, shuffles the list, and joins them.
This will take up an extra 2 x of disk space and probably won't work with large files.
Upvotes: 0
Reputation: 124646
Given the size of the input files to work with, this is a sufficiently complex problem. I wouldn't try to push the limits of shell scripting, best to code this in C or other.
I'm not aware of a tool that can make this easy.
Upvotes: 1