Reputation: 441
I have a dataframe that looks like this:
Id Country amount
1 AT 10
2 BE 20
3 DE 30
1 AT 10
1 BE 20
3 DK 30
What I want to do is aggregate amount by ID, country, So my df should look like:
Id Country amount AT_amount BE_amount DE_amount DK_amount
1 AT 10 20 20 0 0
2 BE 20 0 20 0 0
3 DE 30 0 0 30 30
1 AT 10 20 20 0 0
1 BE 20 20 20 0 0
3 DK 30 0 0 30 30
I tried working with groupby, but using:
df['AT_amount'] = df.groupby(['Id', 'Country').sum(amount)
will not work, since then I will not get the values for all Id==1, but only for ID==1 and will give me a value regardless of the country.
I could first do this, set the values to 0 if country!=AT and then take a groupby maximum, but this seems a bit of a long way around.
To get these values for all countries it seems I will have to write a loop, or is there a quick way to create a new variable for all subgroup countries ?
Upvotes: 1
Views: 281
Reputation: 863741
I think you can use pivot_table
, add_suffix
and last merge
:
df1 = df.pivot_table(index='Id',
columns='Country',
values='amount',
fill_value='0',
aggfunc=sum).add_suffix('_amount').reset_index()
print df1
Country Id AT_amount BE_amount DE_amount DK_amount
0 1 20 20 0 0
1 2 0 20 0 0
2 3 0 0 30 30
print pd.merge(df,df1, on='Id', how='left')
Id Country amount AT_amount BE_amount DE_amount DK_amount
0 1 AT 10 20 20 0 0
1 2 BE 20 0 20 0 0
2 3 DE 30 0 0 30 30
3 1 AT 10 20 20 0 0
4 1 BE 20 20 20 0 0
5 3 DK 30 0 0 30 30
Upvotes: 1
Reputation: 2137
print df.join(df.pivot_table('amount', 'Id', 'Country', aggfunc=sum, fill_value=0).add_suffix('_amount'), on='Id')
Id Country amount AT_amount BE_amount DE_amount DK_amount
0 1 AT 10 20 20 0 0
1 2 BE 20 0 20 0 0
2 3 DE 30 0 0 30 30
3 1 AT 10 20 20 0 0
4 1 BE 20 20 20 0 0
5 3 DK 30 0 0 30 30
Upvotes: 1