Gwen Yang
Gwen Yang

Reputation: 146

Count frequency of each word contained in column string values

For example, I have a dataframe like this:

data = {'id': [1,1,1,2,2],
        'value': ['red','red and blue','yellow','oak','oak wood']
}
df = pd.DataFrame (data, columns = ['id','value'])

I want :

id value   count
1  red     2
1  blue    1
1  yellow  1
2  oak     2
2  wood    1

Many thanks!

Upvotes: 2

Views: 56

Answers (1)

jezrael
jezrael

Reputation: 862406

Solution for pandas 0.25+ with DataFrame.explode by lists created by Series.str.split and GroupBy.size:

df1 = (df.assign(value = df['value'].str.split())
         .explode('value')
         .groupby(['id','value'], sort=False)
         .size()
         .reset_index(name='count'))
print (df1)
   id   value  count
0   1     red      2
1   1     and      1
2   1    blue      1
3   1  yellow      1
4   2     oak      2
5   2    wood      1

For lower pandas versions use DataFrame.set_index with Series.str.split and expand=True for DataFrame, reshape by DataFrame.stack, create columns from MultiIndex Series ands use same solution like above:

df1 = (df.set_index('id')['value']
         .str.split(expand=True)
         .stack()
         .reset_index(name='value')
         .groupby(['id','value'], sort=False)
         .size()
         .reset_index(name='count')
         )
print (df1)
   id   value  count
0   1     red      2
1   1     and      1
2   1    blue      1
3   1  yellow      1
4   2     oak      2
5   2    wood      1

Upvotes: 4

Related Questions