Reputation: 33
I have a dataframe where some cells contain lists of multiple values, like so:
import pandas as pd
df = pd.DataFrame(
{'category': [[x,y,z],[x],[y,z],[x,z]]
'value': [20,30,20,10]
}
)
df
Out[10]:
category value
0 [x, y, z] 20
1 [x] 30
2 [y, z] 20
3 [x, z] 10
I'd like to group the data by unique elements in the category
column and capture both the count of each element and mean of the value
in which the element is present.
Intended output should look like:
count mean
x 3 20
y 2 20
z 3 16.7
I'm relatively familiar with simple groupby functions, and am able to create a flat list of unique elements (i.e. [x,y,z]). However, I'm not sure how to use that flat list to transform the data as desired above. Help much appreciated!
Upvotes: 3
Views: 650
Reputation: 75080
Use (explode
for pandas 0.25+):
df.explode('category').groupby('category')['value'].agg(['count','mean'])
count mean
category
x 3 20.000000
y 2 20.000000
z 3 16.666667
For pandas version below 0.25
:
(df.loc[df.index.repeat(df['category'].str.len()),['value']]
.assign(category=np.concatenate(df['category']))
.groupby('category')['value'].agg(['count','mean']))
count mean
category
x 3 20.000000
y 2 20.000000
z 3 16.666667
Upvotes: 4