programmingtech
programmingtech

Reputation: 428

Count the occurence of words in a list of all rows of dataframe

I have a dataframe in which one of column has rows with list of values. I want to count the number of occurence of all the words inside the list among all rows.

For ex: dataframe df

Column A         Column B
animal            [cat, dog, tiger]
place             [italy, china, japan]
pets              [cat, dog]

Then I need result as:

cat : 2
dog: 2
tiger: 1 and so on

Upvotes: 0

Views: 1214

Answers (2)

satyam soni
satyam soni

Reputation: 269

Use Counter from collections and print the values. Check the below code for the reference.

import pandas as pd

#for counting the elements
from collections import Counter

#dataframe with list values in column B
df = pd.DataFrame([[1,['apple','mango','apple'],3],[1,['mango','mango','soni'],3]],columns=['A','B','C'])

#formatting the output post counting
for i,row in df.iterrows():
    c = Counter(row['B'])
    print(f'for index {i}')
    for k in c.keys():
        print(f'{k}: {c.get(k)}')

Upvotes: 0

jezrael
jezrael

Reputation: 862601

You need flatten values to simple list and count values - by Counter or by Series.value_counts:

from collections import Counter

s = pd.Series(Counter([y for x in df['Column B'] for y in x]))
print (s)
cat      2
dog      2
tiger    1
italy    1
china    1
japan    1
dtype: int64

Alternative1:

from itertools import chain
from collections import Counter

s = pd.Series(Counter(chain.from_iterable(df['Column B'])))

Alternative2:

s = pd.Series(np.concatenate(df['Column B'])).value_counts()

Slow alternative in large data:

s = pd.Series(df['Column B'].sum()).value_counts()

Upvotes: 2

Related Questions