Gustavo Diniz
Gustavo Diniz

Reputation: 5

How can I sort a very large log file, too large to load into main memory?

Given that i have a very large log file, large enough that it can not be loaded into my main memory, and i wanted to sort it somehow, what would be the most recommended sorting technique and algorithm?

Upvotes: 0

Views: 1064

Answers (4)

by he
by he

Reputation: 1

install

pip install bigsort

and then

cat unsorted.txt | bigsort > sorted.txt

Upvotes: 0

TilmannZ
TilmannZ

Reputation: 1899

If you are looking for an algorithm, you could apply merge sort.

Essentially you split your data into smaller chunks and sort each chunk. Then you take two sorted chunks and merge them (this can be done in a streaming fashion, just take the smallest value of the two chunks and increment)m this results in a bigger chunk. Keep doing this until you have merged all chunks.

Upvotes: 2

Joe
Joe

Reputation: 37

This depends on OS. If on Linux/Unix, you can use the sed command to print specific lines

sed -n -e 120p /var/log/syslog

Which would print line 120 of the syslog file. You could also use head

head -n 15 /var/log/syslog

Which would print the first 15 lines of the syslog file. There is also grep, tail, etc. for viewing portions of a large file. More detail here on these and more:

http://www.thegeekstuff.com/2009/08/10-awesome-examples-for-viewing-huge-log-files-in-unix

For Windows, there is Large Text File Viewer

Upvotes: -2

AlexP
AlexP

Reputation: 4430

If you have GNU sort, use it. It knows how to deal with large files. For details, see the answers to How to sort big files on Unix SE. You will of course need sufficient free disk space.

Upvotes: 2

Related Questions