Reputation: 434
my code creates for every document I am processing a vector based Bag-of-words.
It works and prints the frequency of every single word in the document. Additionally I would like to print every word just right in front of the number, just like this:
['word', 15]
I tried it on my own. What I get right now looks like this:
This is my code:
for doc in docsClean:
bag_vector = np.zeros(len(doc))
for w in doc:
for i,word in enumerate(doc):
if word == w:
bag_vector[i] += 1
print(bag_vector)
print("{0},{1}\n".format(w,bag_vector[i]))
Upvotes: 0
Views: 365
Reputation: 5965
I would suggest using a dict
to store the frequency of each word.
There is already an inbuilt python feature to do this - collections.Counter
.
from collections import Counter
# Random words
words = ['lacteal', 'brominating', 'postmycotic', 'legazpi', 'enclosing', 'arytaenoid', 'brominating', 'postmycotic', 'legazpi', 'enclosing']
frequency = Counter(words)
print(frequency)
Output:
Counter({'brominating': 2, 'postmycotic': 2, 'legazpi': 2, 'enclosing': 2, 'lacteal': 1, 'arytaenoid': 1})
If, for any reason, you don't want to use collections.Counter, here is a simple code to do the same task.
words = ['lacteal', 'brominating', 'postmycotic', 'legazpi', 'enclosing', 'arytaenoid', 'brominating', 'postmycotic', 'legazpi', 'enclosing']
freq = {} # Empty dict
for word in words:
freq[word] = freq.get(word, 0) + 1
print(freq)
This code works by adding 1 to the frequency of word
, if it is already present in freq
, otherwise freq.get(word, 0)
returns 0
, so the frequency of a new word gets stored as 1
.
Output:
{'lacteal': 1, 'brominating': 2, 'postmycotic': 2, 'legazpi': 2, 'enclosing': 2, 'arytaenoid': 1}
Upvotes: 2