Reputation: 33
As an example, let's say I have a python pandas DataFrame that is the following:
# PERSON THINGS
0 Joe Candy Corn, Popsicles
1 Jane Popsicles
2 John Candy Corn, Ice Packs
3 Lefty Ice Packs, Hot Dogs
I would like to use the pandas groupby functionality to have the following output:
THINGS COUNT
Candy Corn 2
Popsicles 2
Ice Packs 2
Hot Dogs 1
I generally understand the following groupby command:
df.groupby(['THINGS']).count()
But the output is not by individual item, but by the entire string. I think I understand why this is, but it's not clear to me how to best approach the problem to get the desired output instead of the following:
THINGS PERSON
Candy Corn, Ice Packs 1
Candy Corn, Popsicles 1
Ice Packs, Hot Dogs 1
Popsicles 1
Does pandas have a function like the LIKE in SQL, or am I thinking about how to do this wrong in pandas?
Any assistance appreciated.
Upvotes: 3
Views: 588
Reputation: 8631
You need to split THINGS
by ,
and flatten the series and count values.
pd.Series([item.strip() for sublist in df['THINGS'].str.split(',') for item in sublist]).value_counts()
Output:
Candy Corn 2
Popsicles 2
Ice Packs 2
Hot Dogs 1
dtype: int64
Upvotes: 2
Reputation: 76947
Create a series by splitting words, and use value_counts
In [292]: pd.Series(df.THINGS.str.cat(sep=', ').split(', ')).value_counts()
Out[292]:
Popsicles 2
Ice Packs 2
Candy Corn 2
Hot Dogs 1
dtype: int64
Upvotes: 7