Reputation: 11793
I have a dataframe like below. Each row of column words
contains one or more words separated by ;
.
import pandas as pd
import numpy as np
dfm = pd.DataFrame({'id': np.arange(5), 'words': ['apple;pear;orange', 'apple', 'pear;grape', 'orange', 'orange;pear']})
I need to count the occurrence of the words. Here is the output I need:
word count
0 apple 2
1 pear 3
2 orange 3
3 grape 1
Does anyone know how I can achieve that? Thanks.
Upvotes: 1
Views: 158
Reputation: 30258
You can value_counts()
the explode()
of the split on words, e.g.:
In []:
dfm.words.str.split(';').explode().value_counts()
Out[]:
orange 3
pear 3
apple 2
grape 1
Name: words, dtype: int64
Or you can use groupby()
to not sort by value, which gives the output being looked for:
In []:
words = dfm.words.str.split(';').explode()
words.groupby(words).count().to_frame('count').reset_index()
Out[]:
words count
0 apple 2
1 grape 1
2 orange 3
3 pear 3
Upvotes: 2