TheRealFakeNews
TheRealFakeNews

Reputation: 8153

How to merge N sorted files in to one sorted file without storing in memory?

I have N files that are delineated by a new-line break. Each file has its rows/lines sorted in lexicographic order (the rows themselves don't have to be sorted). For example:

Include any error messages \n
Include details about your goal \n
Describe expected and actual results \n

How do I merge all multiple files so that the output file is sorted without loading all files in memory?

While this is not an algorithm problem per se, it does remind me of the leetcode problem of Merging K Sorted Linked Lists. In this case, a node would be one line in a file.

Upvotes: 0

Views: 400

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195478

Try heapq.merge:

If you have two files:

file1.txt:

aaa
aab
bbb
ooo

file2.txt:

ccc
ddd
zzz

Then:

from heapq import merge

files = ["file1.txt", "file2.txt"]

for m in merge(*map(open, files)):
    print(m.strip())

Prints:

aaa
aab
bbb
ccc
ddd
ooo
zzz

Upvotes: 4

Related Questions