ste92
ste92

Reputation: 434

Print corresponding words to word-counts (Bag-of-Words)

my code creates for every document I am processing a vector based Bag-of-words.

It works and prints the frequency of every single word in the document. Additionally I would like to print every word just right in front of the number, just like this:

['word', 15]

I tried it on my own. What I get right now looks like this: enter image description here

This is my code:

for doc in docsClean:

    bag_vector = np.zeros(len(doc))

    for w in doc:
        for i,word in enumerate(doc):
            if word == w:
                bag_vector[i] += 1

    print(bag_vector)
    print("{0},{1}\n".format(w,bag_vector[i]))

Upvotes: 0

Views: 365

Answers (1)

Diptangsu Goswami
Diptangsu Goswami

Reputation: 5965

I would suggest using a dict to store the frequency of each word.

There is already an inbuilt python feature to do this - collections.Counter.

from collections import Counter

# Random words
words = ['lacteal', 'brominating', 'postmycotic', 'legazpi', 'enclosing', 'arytaenoid', 'brominating', 'postmycotic', 'legazpi', 'enclosing']
frequency = Counter(words)

print(frequency)

Output:

Counter({'brominating': 2, 'postmycotic': 2, 'legazpi': 2, 'enclosing': 2, 'lacteal': 1, 'arytaenoid': 1})

If, for any reason, you don't want to use collections.Counter, here is a simple code to do the same task.

words = ['lacteal', 'brominating', 'postmycotic', 'legazpi', 'enclosing', 'arytaenoid', 'brominating', 'postmycotic', 'legazpi', 'enclosing']

freq = {}  # Empty dict

for word in words:
    freq[word] = freq.get(word, 0) + 1

print(freq)

This code works by adding 1 to the frequency of word, if it is already present in freq, otherwise freq.get(word, 0) returns 0, so the frequency of a new word gets stored as 1.

Output:

{'lacteal': 1, 'brominating': 2, 'postmycotic': 2, 'legazpi': 2, 'enclosing': 2, 'arytaenoid': 1}

Upvotes: 2

Related Questions