Test
Test

Reputation: 549

Count unique words with collections and dataframe

I have a problem, I want to count the unique words from a dataframe, but unfortunately it only counts the first sentences.

                          text
0  hello is a unique sentences
1         hello this is a test
2              does this works
import pandas as pd
d = {
    "text": ["hello is a unique sentences",
             "hello this is a test", 
             "does this works"],
}
df = pd.DataFrame(data=d)


from collections import Counter

# Count unique words
def counter_word(text_col):
    print(len(text_col.values))
    count = Counter()
    for i, text in enumerate(text_col.values):
        print(i)
        for word in text.split():
            count[word] += 1
        return count

counter = counter_word(df['text'])
len(counter)

Upvotes: 3

Views: 262

Answers (3)

Stuart
Stuart

Reputation: 9858

It may be easier and more efficient to stack the words into a single column then use pandas value_counts to count them, instead of Counter:

df["text"].str.split(expand=True).stack().value_counts()

Upvotes: 1

mozway
mozway

Reputation: 260630

You can use itertools.chain to have a generator to feed to Counter:

from itertools import chain
counter = Counter(chain.from_iterable(map(str.split, df['text'])))

output:

Counter({'hello': 2,
         'is': 2,
         'a': 2,
         'unique': 1,
         'sentences': 1,
         'this': 2,
         'test': 1,
         'does': 1,
         'works': 1})

Upvotes: 1

jezrael
jezrael

Reputation: 862641

I think simplier is join values by space, then split for words and count:

counter = Counter((' '.join(df['text'])).split())

print (counter)
Counter({'hello': 2, 'is': 2, 'a': 2, 'this': 2, 'unique': 1, 'sentences': 1, 'test': 1, 'does': 1, 'works': 1})

Upvotes: 2

Related Questions