Kyle R. Conway
Kyle R. Conway

Reputation: 33

groupby comma-separated values in single DataFrame column python/pandas

As an example, let's say I have a python pandas DataFrame that is the following:

#  PERSON  THINGS
0  Joe     Candy Corn, Popsicles
1  Jane    Popsicles
2  John    Candy Corn, Ice Packs
3  Lefty   Ice Packs, Hot Dogs

I would like to use the pandas groupby functionality to have the following output:

THINGS        COUNT
Candy Corn    2
Popsicles     2
Ice Packs     2
Hot Dogs      1

I generally understand the following groupby command:

df.groupby(['THINGS']).count()

But the output is not by individual item, but by the entire string. I think I understand why this is, but it's not clear to me how to best approach the problem to get the desired output instead of the following:

THINGS                  PERSON
Candy Corn, Ice Packs   1
Candy Corn, Popsicles   1
Ice Packs, Hot Dogs     1
Popsicles               1

Does pandas have a function like the LIKE in SQL, or am I thinking about how to do this wrong in pandas?

Any assistance appreciated.

Upvotes: 3

Views: 588

Answers (2)

harpan
harpan

Reputation: 8631

You need to split THINGS by , and flatten the series and count values.

pd.Series([item.strip() for sublist in df['THINGS'].str.split(',') for item in sublist]).value_counts()

Output:

Candy Corn    2
Popsicles     2
Ice Packs     2
Hot Dogs      1
dtype: int64

Upvotes: 2

Zero
Zero

Reputation: 76947

Create a series by splitting words, and use value_counts

In [292]: pd.Series(df.THINGS.str.cat(sep=', ').split(', ')).value_counts()
Out[292]:
Popsicles     2
Ice Packs     2
Candy Corn    2
Hot Dogs      1
dtype: int64

Upvotes: 7

Related Questions