Ryan Lechner
Ryan Lechner

Reputation: 33

Pandas - Dataframe has column with lists. How can I groupby the elements within the list?

I have a dataframe where some cells contain lists of multiple values, like so:

import pandas as pd

df = pd.DataFrame(
    {'category': [[x,y,z],[x],[y,z],[x,z]]
     'value': [20,30,20,10]
    }
)

df

Out[10]: 
     category  value
0    [x, y, z] 20
1    [x]       30
2    [y, z]    20
3    [x, z]    10

I'd like to group the data by unique elements in the category column and capture both the count of each element and mean of the value in which the element is present.

Intended output should look like:

     count  mean
x    3      20
y    2      20
z    3      16.7

I'm relatively familiar with simple groupby functions, and am able to create a flat list of unique elements (i.e. [x,y,z]). However, I'm not sure how to use that flat list to transform the data as desired above. Help much appreciated!

Upvotes: 3

Views: 650

Answers (1)

anky
anky

Reputation: 75080

Use (explode for pandas 0.25+):

df.explode('category').groupby('category')['value'].agg(['count','mean'])

          count       mean
category                  
x             3  20.000000
y             2  20.000000
z             3  16.666667

For pandas version below 0.25:

(df.loc[df.index.repeat(df['category'].str.len()),['value']]
  .assign(category=np.concatenate(df['category']))
 .groupby('category')['value'].agg(['count','mean']))

          count       mean
category                  
x             3  20.000000
y             2  20.000000
z             3  16.666667

Upvotes: 4

Related Questions