Reputation: 115
So i basically have a pandas data frame : Say
1. oshin oshin1 oshin2
2. oshin3 oshin2 oshin4
I want to get a counter in such a way (basically my output) should be:
oshin:1
oshin1:1
oshin2:2
oshin3:1
oshin4:1
Such that i can export the output to a csv file as it is going to be really long. How do i do it in pandas? OR how can i do it for any column in pandas for a matter of fact.
Upvotes: 1
Views: 972
Reputation: 862641
I think you need first create lists
in each column by apply
and split
, then convert to numpy array by values
and flat by numpy.ravel
. Convert to list
and apply Counter
, last convert to dict
:
print (df)
col
0 oshin oshin1 oshin2
1 oshin3 oshin2 oshin4
from collections import Counter
cols = ['col', ...]
d = dict(Counter(np.concatenate(df[cols].apply(lambda x : x.str.split()) \
.values.ravel().tolist())))
print (d)
{'oshin3': 1, 'oshin4': 1, 'oshin1': 1, 'oshin': 1, 'oshin2': 2}
But if only one column (thanks Jon Clements):
d = dict(df['col'].str.split().map(Counter).sum())
print (d)
{'oshin3': 1, 'oshin4': 1, 'oshin1': 1, 'oshin': 1, 'oshin2': 2}
EDIT:
Another faster solution from John Galt, thank you:
d = pd.Series(' '.join(df['col']).split()).value_counts().to_dict()
print (d)
{'oshin3': 1, 'oshin4': 1, 'oshin1': 1, 'oshin': 1, 'oshin2': 2}
Upvotes: 2