Kakarot_7
Kakarot_7

Reputation: 342

groupby() and agg() in pandas

Here is the dataframe named 'census':

     SUMLEV  REGION  COUNTY     STNAME        CTYNAME           CENSUS2010POP   ESTIMATESBASE2010
0      50     3       1        Alabama      Autauga County        54571              54571
1      50     3       3        Alabama      Baldwin County        182265            182265
2      50     3       5        Alabama      Barbour County        27457              27457
3      50     4       3        Arizona      Cochise County        131346            131357
4      50     4       5        Arizona      Coconino County       134421            134437
5      50     4       7        Arizona      Gila County           53597              53597
6      50     4      21     California      Glenn County          28122              28122
7      50     4      23     California      Humboldt County       134623            134623
8      50     4      25     California      Imperial County       174528            17452

I want to calculate the sum and average of 'CENSUS2010POP' for each state('STNAME') and display it as a dataframe.

Here's my code,

census.set_index('STNAME')
census.groupby(level=0).CENSUS2010POP.agg({'avg': np.mean, 'sum': np.sum}).head()

However it gives the error: nested renamer is not supported

I also tried

census.groupby('STNAME').CENSUS2010POP.agg({'avg':np.mean, 'sum':np.sum})

It gives the same error as above.

Upvotes: 1

Views: 99

Answers (1)

jezrael
jezrael

Reputation: 862911

Because processing only one column is possible pass tuples:

df = census.groupby('STNAME').CENSUS2010POP.agg([('avg', np.mean), ('sum', np.sum)]).head()
print (df)
                      avg     sum
STNAME                           
Alabama      88097.666667  264293
Arizona     106454.666667  319364
California  112424.333333  337273

Or named aggregations:

census.groupby('STNAME').agg(avg = ('CENSUS2010POP', np.mean), 
                            sum=  ('CENSUS2010POP', np.sum)).head()

Upvotes: 1

Related Questions