Reputation: 806
Based on this post, using shuf
is the fastest way:
import sh
sh.shuf("words.txt", out="shuffled_words.txt")
However, this code shuffle the header as well. My file has a header and I don't want the header to shuffle in the data.
Upvotes: 2
Views: 490
Reputation: 57085
Copy the content of the file into another file without the header:
with open("words.txt") as infile, open("words-nohead.txt", "w") as outfile:
for i,line in enumerate(infile):
if i: outfile.write(line)
Then shuffle the headerless file. Then copy the first line of the first file and the headerless file into shuffled_words.txt (I think you can use sh.cat()
for this) and remove the interim files.
Actually, you do not need Python for this. Bash alone suffices:
head -n 1 words.txt > shuffled_words.txt
tail -n+2 words.txt | shuf >> shuffled_words.txt
Bear in mind that shuf
reads the whole file in memory, anyway. You must have enough memory for the file.
Upvotes: 2