Pandas groupby stored in a new dataframe

Question

I have the following code:

import pandas as pd
df1 = pd.DataFrame({'Counterparty':['Bank','Bank','GSE','PSE'],
            'Sub Cat':['Tier1','Small','Small', 'Small'],
            'Location':['US','US','UK','UK'],
            'Amount':[50, 55, 65, 55],
            'Amount1':[1,2,3,4]})

df2=df1.groupby(['Counterparty','Location'])[['Amount']].sum()
df2.dtypes
df1.dtypes

The df2 data frame does not have the columns that I am aggregating across ( Counterparty and Location). Any ideas why this is the case ? Both Amount and Amount1 are numeric fields. I just want to sum across Amount and aggregate across Amount1

jezrael · Accepted Answer

For columns from index add as_index=False parameter or reset_index:

df2=df1.groupby(['Counterparty','Location'])[['Amount']].sum().reset_index()
print (df2)
  Counterparty Location  Amount
0         Bank       US     105
1          GSE       UK      65
2          PSE       UK      55

df2=df1.groupby(['Counterparty','Location'], as_index=False)[['Amount']].sum()
print (df2)
  Counterparty Location  Amount
0         Bank       US     105
1          GSE       UK      65
2          PSE       UK      55

If aggregate by all columns here happens automatic exclusion of nuisance columns - column Sub Cat is omitted:

df2=df1.groupby(['Counterparty','Location']).sum().reset_index()
print (df2)
  Counterparty Location  Amount  Amount1
0         Bank       US     105        3
1          GSE       UK      65        3
2          PSE       UK      55        4


df2=df1.groupby(['Counterparty','Location'], as_index=False).sum()

Pandas groupby stored in a new dataframe

Answers (2)

Related Questions