user3204121
user3204121

Reputation: 59

More efficient way of concatenating a huge amount of lists?

I have a huge amount of sentences (just a bit over 100,000). Each one contains on average 10 words. I am trying to put them together into one big list so I can us Counter from the collections library to show me the frequency each word occurs. What I'm doing currently is this:

from collections import Counter
words = []
for sentence in sentenceList:
    words = words + sentence.split()
counts = Counter(words)

I was wondering if there is a way to do the same thing more efficiently. I've been waiting almost an hour now for this code to finish executing. I would think the concatenating is what is making this take so long since if I replace the line words = words + sentence.split() with print(sentence.split()) it finishes executing in seconds. Any help would be much appreciated.

Upvotes: 1

Views: 96

Answers (2)

iz_
iz_

Reputation: 16623

You can use extend:

from collections import Counter
words = []
for sentence in sentenceList:
    words.extend(sentence.split())
counts = Counter(words)

Or, a list comprehension like so:

words = [word for sentence in sentenceList for word in sentence.split()]

If you don't need words later, you can pass a generator into Counter:

counts = Counter(word for sentence in sentenceList for word in sentence.split())

Upvotes: 2

blhsing
blhsing

Reputation: 106891

Don't build a big, memory-hogging list if all you want to do is to count the elements. Keep updating the Counter object with new iterables instead:

counts = Counter()
for sentence in sentenceList:
    counts.update(sentence.split())

Upvotes: 3

Related Questions