Reputation: 153
Problem Attempting a groupby on a simple dataframe (downloadable csv) and then agg to return aggregate values for columns (size, sum, mean, std deviation). What seems like a simple problem is giving an unexpectedly challenging error.
Top15.groupby('Continent')['Pop Est'].agg(np.mean, np.std...etc)
# returns
ValueError: No axis named <function std at 0x7f16841512f0> for object type <class 'pandas.core.series.Series'>
What I am trying to get is a df with index set to continents and columns ['size', 'sum', 'mean', 'std']
Example Code
import pandas as pd
import numpy as np
# Create df
df = pd.DataFrame({'Country':['Australia','China','America','Germany'],'Pop Est':['123','234','345','456'],'Continent':['Asia','Asia','North America','Europe']})
# group and agg
df = df.groupby('Continent')['Pop Est'].agg('size','sum','np.mean','np.std')
Upvotes: 2
Views: 4153
Reputation: 11460
You can only aggregate size and sum on numeric values so when you create your dataframe don't input your numbers as stings:
df = pd.DataFrame({'Country':['Australia','China','America','Germany'],'PopEst':[123,234,345,456],'Continent':['Asia','Asia','North America','Europe']})
I think that this will get you what you want?
grouped = df.groupby('Continent')
grouped['PopEst'].agg(['size','sum','mean','std'])
size sum mean std
Continent
Asia 2 357 178.5 78.488853
Europe 1 456 456.0 NaN
North America 1 345 345.0 NaN
Upvotes: 5