Reputation: 51
I have a df like this
df a
name | list_of_skills
---------------------------
brian | ['coding in python', 'halo 3']
jon | ['coding in python', 'running', 'sports']
Is there a way I can get some counts of the most common phrases like this:
'coding in python' 2
'halo 3' 1
'running' 1
'sports' 1
Just a simple phrase counter without iterating through each list and comparing it to all the other ones?
Upvotes: 1
Views: 223
Reputation: 862511
Use Series.explode
with Series.value_counts
:
s = df['list_of_skills'].explode().value_counts()
print (s)
coding in python 2
running 1
sports 1
halo 3 1
Name: list_of_skills, dtype: int64
If need DataFrame:
df1 = s.rename_axis('words').reset_index(name='counts')
print (df1)
words counts
0 coding in python 2
1 running 1
2 sports 1
3 halo 3 1
Upvotes: 5
Reputation: 34046
Use df.explode
with Groupby.size
:
In [254]: df = pd.DataFrame({'name':['brian', 'john'], 'list_of_skills':[['coding in python', 'halo 3'], ['coding in python', 'running', 'sports']]})
In [255]: df
Out[255]:
name list_of_skills
0 brian [coding in python, halo 3]
1 john [coding in python, running, sports]
In [258]: ans = df.explode('list_of_skills').groupby('list_of_skills').size()
In [259]: ans
Out[259]:
list_of_skills
coding in python 2
halo 3 1
running 1
sports 1
dtype: int64
Upvotes: 1