Peter
Peter

Reputation: 593

Group By and SUM column

I am looking at US census data:

                                                population  
State           County    
Alabama         Jefferson County                658466  
                Mobile County                   412992  
                Madison County                  334811  
Alaska          Anchorage Municipality          291826  
                Fairbanks North Star Borough    97581  
                Matanuska-Susitna Borough       88995  

Final output should sum up the population for each state:

State           SumOfPopulation 
Alabama         1406269                  
Alaska          478402  

My attempts with groupby yielded in below error

df.groupby('State')['population'].agg('sum') 

KeyError: 'STNAME'  

What the appropriate approach would look like?

Upvotes: 1

Views: 68

Answers (2)

jezrael
jezrael

Reputation: 862406

Your code working in pandas 0.20.0+, but better is omit agg and use sum only:

df.groupby('State', as_index=False)['population'].sum()

But for lower versions first reset_index for columns from MultiIndex:

df.reset_index().groupby('State', as_index=False)['population'].sum()

The simpliest solution is use sum:

df = df['population'].sum(level='State').reset_index()
#for seelct level by position
#df = df['population'].sum(level=0).reset_index()

print (df)
     State  population
0  Alabama     1406269
1   Alaska      478402

Upvotes: 0

sid8491
sid8491

Reputation: 6800

df.groupby('State', as_index=False)['population'].sum()

this will work fine

Upvotes: 1

Related Questions