Reputation: 1980
I've got a list of words, about 273000 of them in the list Word_array
There are about 17000 unique words, and they're stored in Word_arrayU
I want a count for each one
#make bag of worsds
Word_arrayU = np.unique(Word_array)
wordBag = [['0','0'] for _ in range(len(Word_array))] #prealocate necessary space
i=0
while i< len(Word_arrayU): #for each unique word
wordBag[i][0] = Word_arrayU[i]
#I think this is the part that takes a long time. summing up a list comprehension with a conditional. Just seems sloppy
wordBag[i][1]=sum([1 if x == Word_arrayU[i] else 0 for x in Word_array])
i=i+1
summing up a list comprehension with a conditional. Just seems sloppy; is there a better way to do it?
Upvotes: 1
Views: 1022
Reputation: 1108
I don't know about most 'Pythonic' but definitely the easiest way of doing this would be to use collections.Counter.
from collections import Counter
Word_array = ["word1", "word2", "word3", "word1", "word2", "word1"]
wordBag = Counter(Word_array).items()
Upvotes: 0
Reputation: 1371
from collections import Counter
counter = Counter(Word_array)
the_count_of_some_word = counter["some_word"]
#printing the counts
for word, count in counter.items():
print("{} appears {} times.".format(word, count)
Upvotes: 2
Reputation: 180441
Since you are already using numpy.unique, just set return_counts=True in the unique call:
import numpy as np
unique, count = np.unique(Word_array, return_counts=True)
That will give you two arrays, the unique elements and their counts:
n [10]: arr = [1,3,2,11,3,4,5,2,3,4]
In [11]: unique, count = np.unique(arr, return_counts=True)
In [12]: unique
Out[12]: array([ 1, 2, 3, 4, 5, 11])
In [13]: count
Out[13]: array([1, 2, 3, 2, 1, 1])
Upvotes: 1
Reputation: 5660
In python 3 there is a built-in list.count function. For example:
>>> h = ["a", "b", "a", "a", "c"]
>>> h.count("a")
3
>>>
So, you could make it more efficient by doing something like:
Word_arrayU = np.unique(Word_array)
wordBag = []
for uniqueWord in Word_arrayU:
wordBag.append([uniqueWord, Word_array.count(uniqueWord)])
Upvotes: 0
Reputation: 60994
If you want a less efficient (than Counter
), but more transparent solution, you can use collections.defaultdict
from collections import defaultdict
my_counter = defaultdict(int)
for word in word_array:
my_counter[word] += 1
Upvotes: -1
Reputation: 13024
Building on the suggestion from @jonrsharpe...
from collections import Counter
words = Counter()
words['foo'] += 1
words['foo'] += 1
words['bar'] += 1
Output
Counter({'bar': 1, 'foo': 2})
It's really convenient because you don't have to initialize words.
You can also initialize directly from a list of words:
Counter(['foo', 'foo', 'bar'])
Output
Counter({'bar': 1, 'foo': 2})
Upvotes: 0