groupby comma-separated values in single DataFrame column python/pandas

Question

As an example, let's say I have a python pandas DataFrame that is the following:

#  PERSON  THINGS
0  Joe     Candy Corn, Popsicles
1  Jane    Popsicles
2  John    Candy Corn, Ice Packs
3  Lefty   Ice Packs, Hot Dogs

I would like to use the pandas groupby functionality to have the following output:

THINGS        COUNT
Candy Corn    2
Popsicles     2
Ice Packs     2
Hot Dogs      1

I generally understand the following groupby command:

df.groupby(['THINGS']).count()

But the output is not by individual item, but by the entire string. I think I understand why this is, but it's not clear to me how to best approach the problem to get the desired output instead of the following:

THINGS                  PERSON
Candy Corn, Ice Packs   1
Candy Corn, Popsicles   1
Ice Packs, Hot Dogs     1
Popsicles               1

Does pandas have a function like the LIKE in SQL, or am I thinking about how to do this wrong in pandas?

Any assistance appreciated.

Zero · Accepted Answer

Create a series by splitting words, and use value_counts

In [292]: pd.Series(df.THINGS.str.cat(sep=', ').split(', ')).value_counts()
Out[292]:
Popsicles     2
Ice Packs     2
Candy Corn    2
Hot Dogs      1
dtype: int64

groupby comma-separated values in single DataFrame column python/pandas

Answers (2)

Related Questions