Rstevoa
Rstevoa

Reputation: 271

How to sort a large text file alphabetically?

So I have a text file and I need to sort the lines alphabetically. Example input:

This is the first sentence
A sentence here as well
But how do I reorder them?

Output:

A sentence here as well
But how do I reorder them?
This is the first sentence

Here's the thing: This file is so large, I don't have enough RAM to actually split it into a list/array. I tried to use Python's built-in sorted() function and the process got killed.

To give you an idea:

wc -l data
21788172 data

Upvotes: 6

Views: 1538

Answers (2)

Dan Loewenherz
Dan Loewenherz

Reputation: 11236

Similarly to what Hugh recommended (but different in that this isn't a pure-Python solution), you could sort the file in chunks. E.g., split the file into 26 other files--A.txt, B.txt, C.txt, etc. Sort each of those individually and then combine them to get the final result.

Main thing to keep in mind is that the first pass through the source file is merely to divvy up the lines to their constituent first letters. Only after that do you run the sorts through each file. A simple cat A.txt B.txt ... will handle the rest.

Upvotes: 1

Hugh Bothwell
Hugh Bothwell

Reputation: 56634

It sounds like you need to do a merge-sort: divide the file into blocks, sort each block, then merge the sorted blocks back together. See Python class to merge sorted files, how can this be improved?

Upvotes: 5

Related Questions