Javide
Javide

Reputation: 2657

How to calculate aggregated summary statistics in Pandas dataframe

I have a Pandas dataframe similar to this:

>>> df = pd.DataFrame(data=np.array([['red', 'cup', 1.50], ['blue', 'jug', 2.40], ['red', 'cup', 1.75], ['blue', 'cup', 2.30]]),
...                   columns=['colour', 'item', 'price'])
>>> df
  colour item price
0    red  cup   1.5
1   blue  jug   2.4
2    red  cup  1.75
3   blue  cup   2.3

What is the most concise way to calculate the summary statistics of the price for each of the possible combinations of colour and item?

Expected output e.g.:

colour     item      mean     stdev
red        cup       1.625    0.176
blue       jug       2.4      NA
blue       cup       2.3      NA

Upvotes: 5

Views: 1905

Answers (2)

BENY
BENY

Reputation: 323396

Notice the way you create the dataframe forced the columns price to string not numeric anymore, since numpy array only accept one dtype

Run:

df.price=pd.to_numeric(df.price)

I will using describe after groupby

df.groupby(['colour','item']).price.describe()# you can add reset_index() here
             count   mean       std  min     25%    50%     75%   max
colour item                                                          
blue   cup     1.0  2.300       NaN  2.3  2.3000  2.300  2.3000  2.30
       jug     1.0  2.400       NaN  2.4  2.4000  2.400  2.4000  2.40
red    cup     2.0  1.625  0.176777  1.5  1.5625  1.625  1.6875  1.75

Or you can using agg

df.groupby(['colour','item']).price.agg(['std','mean'])

Upvotes: 4

Erfan
Erfan

Reputation: 42946

You can use groupby in combination with .agg and pass it the mean and std function:

print(df.groupby(['colour', 'item']).agg({'price':['mean', 'std']}).reset_index())

  colour item  price          
                mean       std
0   blue  cup  2.300       NaN
1   blue  jug  2.400       NaN
2    red  cup  1.625  0.176777

Upvotes: 2

Related Questions