How to calculate aggregated summary statistics in Pandas dataframe

Question

I have a Pandas dataframe similar to this:

>>> df = pd.DataFrame(data=np.array([['red', 'cup', 1.50], ['blue', 'jug', 2.40], ['red', 'cup', 1.75], ['blue', 'cup', 2.30]]),
...                   columns=['colour', 'item', 'price'])
>>> df
  colour item price
0    red  cup   1.5
1   blue  jug   2.4
2    red  cup  1.75
3   blue  cup   2.3

What is the most concise way to calculate the summary statistics of the price for each of the possible combinations of colour and item?

Expected output e.g.:

colour     item      mean     stdev
red        cup       1.625    0.176
blue       jug       2.4      NA
blue       cup       2.3      NA

BENY · Accepted Answer

Notice the way you create the dataframe forced the columns price to string not numeric anymore, since numpy array only accept one dtype

Run:

df.price=pd.to_numeric(df.price)

I will using describe after groupby

df.groupby(['colour','item']).price.describe()# you can add reset_index() here
             count   mean       std  min     25%    50%     75%   max
colour item                                                          
blue   cup     1.0  2.300       NaN  2.3  2.3000  2.300  2.3000  2.30
       jug     1.0  2.400       NaN  2.4  2.4000  2.400  2.4000  2.40
red    cup     2.0  1.625  0.176777  1.5  1.5625  1.625  1.6875  1.75

Or you can using agg

df.groupby(['colour','item']).price.agg(['std','mean'])

How to calculate aggregated summary statistics in Pandas dataframe

Answers (2)

Related Questions