Anu
Anu

Reputation: 1

How can I count words frequency (one per line) in huge file (2 gb)?

I am trying to write a program to create a 2gb (approximately) sized file of English words. And from this 2gb file trying to print the frequency of words using external sorting. After external sorting it can just print the count(frequency)

Upvotes: 0

Views: 830

Answers (1)

Katriel
Katriel

Reputation: 123772

Python has a built-in function sorted which sorts an iterable. But even better than that, in versions 2.7 and greater it has a built-in collection for counting the frequencies of things. Assuming your large file has one word per line, you can do:

from collections import Counter
with open(<giant-dictionary>) as words:
    counts = Counter(words)

This will take a few minutes.

Upvotes: 3

Related Questions