Reputation: 5
Given that i have a very large log file, large enough that it can not be loaded into my main memory, and i wanted to sort it somehow, what would be the most recommended sorting technique and algorithm?
Upvotes: 0
Views: 1064
Reputation: 1
install
pip install bigsort
and then
cat unsorted.txt | bigsort > sorted.txt
Upvotes: 0
Reputation: 1899
If you are looking for an algorithm, you could apply merge sort.
Essentially you split your data into smaller chunks and sort each chunk. Then you take two sorted chunks and merge them (this can be done in a streaming fashion, just take the smallest value of the two chunks and increment)m this results in a bigger chunk. Keep doing this until you have merged all chunks.
Upvotes: 2
Reputation: 37
This depends on OS. If on Linux/Unix, you can use the sed command to print specific lines
sed -n -e 120p /var/log/syslog
Which would print line 120 of the syslog file. You could also use head
head -n 15 /var/log/syslog
Which would print the first 15 lines of the syslog file. There is also grep, tail, etc. for viewing portions of a large file. More detail here on these and more:
http://www.thegeekstuff.com/2009/08/10-awesome-examples-for-viewing-huge-log-files-in-unix
For Windows, there is Large Text File Viewer
Upvotes: -2
Reputation: 4430
If you have GNU sort
, use it. It knows how to deal with large files. For details, see the answers to How to sort big files on Unix SE. You will of course need sufficient free disk space.
Upvotes: 2