Reputation: 2657
I have a Pandas dataframe similar to this:
>>> df = pd.DataFrame(data=np.array([['red', 'cup', 1.50], ['blue', 'jug', 2.40], ['red', 'cup', 1.75], ['blue', 'cup', 2.30]]),
... columns=['colour', 'item', 'price'])
>>> df
colour item price
0 red cup 1.5
1 blue jug 2.4
2 red cup 1.75
3 blue cup 2.3
What is the most concise way to calculate the summary statistics of the price for each of the possible combinations of colour and item?
Expected output e.g.:
colour item mean stdev
red cup 1.625 0.176
blue jug 2.4 NA
blue cup 2.3 NA
Upvotes: 5
Views: 1905
Reputation: 323396
Notice the way you create the dataframe forced the columns price to string not numeric anymore, since numpy
array
only accept one dtype
Run:
df.price=pd.to_numeric(df.price)
I will using describe
after groupby
df.groupby(['colour','item']).price.describe()# you can add reset_index() here
count mean std min 25% 50% 75% max
colour item
blue cup 1.0 2.300 NaN 2.3 2.3000 2.300 2.3000 2.30
jug 1.0 2.400 NaN 2.4 2.4000 2.400 2.4000 2.40
red cup 2.0 1.625 0.176777 1.5 1.5625 1.625 1.6875 1.75
Or you can using agg
df.groupby(['colour','item']).price.agg(['std','mean'])
Upvotes: 4
Reputation: 42946
You can use groupby
in combination with .agg
and pass it the mean
and std
function:
print(df.groupby(['colour', 'item']).agg({'price':['mean', 'std']}).reset_index())
colour item price
mean std
0 blue cup 2.300 NaN
1 blue jug 2.400 NaN
2 red cup 1.625 0.176777
Upvotes: 2