Peter
Peter

Reputation: 441

aggregate by group and subgroup

I have a dataframe that looks like this:

Id     Country     amount
1       AT           10
2       BE           20
3       DE           30
1       AT           10
1       BE           20
3       DK           30

What I want to do is aggregate amount by ID, country, So my df should look like:

Id     Country     amount    AT_amount   BE_amount    DE_amount    DK_amount
1       AT           10       20          20            0           0
2       BE           20       0           20            0           0
3       DE           30       0           0             30          30
1       AT           10       20          20            0           0
1       BE           20       20          20            0           0
3       DK           30       0           0             30          30

I tried working with groupby, but using:

df['AT_amount'] = df.groupby(['Id', 'Country').sum(amount)

will not work, since then I will not get the values for all Id==1, but only for ID==1 and will give me a value regardless of the country.

I could first do this, set the values to 0 if country!=AT and then take a groupby maximum, but this seems a bit of a long way around.

To get these values for all countries it seems I will have to write a loop, or is there a quick way to create a new variable for all subgroup countries ?

Upvotes: 1

Views: 281

Answers (2)

jezrael
jezrael

Reputation: 863741

I think you can use pivot_table, add_suffix and last merge:

df1 = df.pivot_table(index='Id', 
                     columns='Country', 
                     values='amount', 
                     fill_value='0', 
                     aggfunc=sum).add_suffix('_amount').reset_index()

print df1    

Country  Id AT_amount BE_amount DE_amount DK_amount
0         1        20        20         0         0
1         2         0        20         0         0
2         3         0         0        30        30

print pd.merge(df,df1, on='Id', how='left')

   Id Country  amount AT_amount BE_amount DE_amount DK_amount
0   1      AT      10        20        20         0         0
1   2      BE      20         0        20         0         0
2   3      DE      30         0         0        30        30
3   1      AT      10        20        20         0         0
4   1      BE      20        20        20         0         0
5   3      DK      30         0         0        30        30

Upvotes: 1

Colin
Colin

Reputation: 2137

print df.join(df.pivot_table('amount', 'Id', 'Country', aggfunc=sum, fill_value=0).add_suffix('_amount'), on='Id')
   Id Country  amount  AT_amount  BE_amount  DE_amount  DK_amount
0   1      AT      10         20         20          0          0
1   2      BE      20          0         20          0          0
2   3      DE      30          0          0         30         30
3   1      AT      10         20         20          0          0
4   1      BE      20         20         20          0          0
5   3      DK      30          0          0         30         30

Upvotes: 1

Related Questions