More efficient way of concatenating a huge amount of lists?

Question

I have a huge amount of sentences (just a bit over 100,000). Each one contains on average 10 words. I am trying to put them together into one big list so I can us Counter from the collections library to show me the frequency each word occurs. What I'm doing currently is this:

from collections import Counter
words = []
for sentence in sentenceList:
    words = words + sentence.split()
counts = Counter(words)

I was wondering if there is a way to do the same thing more efficiently. I've been waiting almost an hour now for this code to finish executing. I would think the concatenating is what is making this take so long since if I replace the line words = words + sentence.split() with print(sentence.split()) it finishes executing in seconds. Any help would be much appreciated.

blhsing · Accepted Answer

Don't build a big, memory-hogging list if all you want to do is to count the elements. Keep updating the Counter object with new iterables instead:

counts = Counter()
for sentence in sentenceList:
    counts.update(sentence.split())

More efficient way of concatenating a huge amount of lists?

Answers (2)

Related Questions