zesla
zesla

Reputation: 11793

Count occurrence of words from a string column in pandas

I have a dataframe like below. Each row of column words contains one or more words separated by ;.

import pandas as pd
import numpy as np
dfm = pd.DataFrame({'id': np.arange(5), 'words': ['apple;pear;orange', 'apple', 'pear;grape', 'orange', 'orange;pear']})

I need to count the occurrence of the words. Here is the output I need:

    word    count
0   apple   2
1   pear    3
2   orange  3
3   grape   1

Does anyone know how I can achieve that? Thanks.

Upvotes: 1

Views: 158

Answers (1)

AChampion
AChampion

Reputation: 30258

You can value_counts() the explode() of the split on words, e.g.:

In []:
dfm.words.str.split(';').explode().value_counts()

Out[]:
orange    3
pear      3
apple     2
grape     1
Name: words, dtype: int64

Or you can use groupby() to not sort by value, which gives the output being looked for:

In []:
words = dfm.words.str.split(';').explode()
words.groupby(words).count().to_frame('count').reset_index()

Out[]:
    words  count
0   apple      2
1   grape      1
2  orange      3
3    pear      3

Upvotes: 2

Related Questions