Reputation: 1596
I have a panda series of long strings.
I want to get the value counts of words in the entire series. I tried with
df.value_counts().to_dict()
But it gives the string level count not the word level count.
How can I do it efficiently ?.
My series looks like below
print df.head(3)
0 4632 N. Rockwell Street, Chicago Rockwell Neighborhood 773 60625 4748 N Kedzie
1 4632 N. Rockwell Street, Chicago Rockwell' Bdoy 773 60625 4632 N Rock
2 4632 N. Rockwell Street, LA Rock hood Grill 773 60625 3658 W Lawren
I wanted to generate a dictionary as follows
a['4632'] = 3
a['Rockwell'] = 3
a['LA'] = 1
and so on
Upvotes: 2
Views: 220
Reputation: 862431
I think here is better pure python solution with Counter
of joined all values to long string with split
:
from collections import Counter
d = Counter(' '.join(df).split())
#if necessary convert to dict
#d = dict(Counter(' '.join(df).split()))
Or use split
with stack
first:
d = df.str.split(expand=True).stack().value_counts().to_dict()
print (d)
{'Rockwell': 4, '4632': 4, 'Street,': 3, '773': 3, '60625': 3, 'N.': 3, 'N': 2, 'Rock': 2, 'Chicago': 2, 'Kedzie': 1, 'Grill': 1, 'Neighborhood': 1, '3658': 1, 'Lawren': 1, 'W': 1, '4748': 1, 'LA': 1, 'hood': 1, "Rockwell'": 1, 'Bdoy': 1}
Upvotes: 2