Oshin Patwa
Oshin Patwa

Reputation: 115

TO get count of list of words from a pandas data frame where each column is a list of words

So i basically have a pandas data frame : Say

1. oshin oshin1 oshin2

2. oshin3 oshin2 oshin4

I want to get a counter in such a way (basically my output) should be:

oshin:1 oshin1:1 oshin2:2 oshin3:1 oshin4:1

Such that i can export the output to a csv file as it is going to be really long. How do i do it in pandas? OR how can i do it for any column in pandas for a matter of fact.

Upvotes: 1

Views: 972

Answers (1)

jezrael
jezrael

Reputation: 862641

I think you need first create lists in each column by apply and split, then convert to numpy array by values and flat by numpy.ravel. Convert to list and apply Counter, last convert to dict:

print (df)
                    col
0   oshin oshin1 oshin2
1  oshin3 oshin2 oshin4

from collections import Counter

cols = ['col', ...]
d = dict(Counter(np.concatenate(df[cols].apply(lambda x : x.str.split()) \
                                        .values.ravel().tolist())))
print (d)
{'oshin3': 1, 'oshin4': 1, 'oshin1': 1, 'oshin': 1, 'oshin2': 2}

But if only one column (thanks Jon Clements):

d = dict(df['col'].str.split().map(Counter).sum())
print (d)
{'oshin3': 1, 'oshin4': 1, 'oshin1': 1, 'oshin': 1, 'oshin2': 2}

EDIT:

Another faster solution from John Galt, thank you:

d = pd.Series(' '.join(df['col']).split()).value_counts().to_dict()
print (d)
{'oshin3': 1, 'oshin4': 1, 'oshin1': 1, 'oshin': 1, 'oshin2': 2}

Upvotes: 2

Related Questions