Reputation: 13519
I have a dataframe like this:
ID A B Area
0 1 A1 B1 1.0
1 2 A1 B2 2.0
2 3 A1 B1 0.5
3 4 A1 B2 1.0
4 5 A2 B3 2.0
5 6 A2 B4 6.0
What I want to get out is this:
ID A B Area B as % of A
0 1 A1 B1 1.0 0.333
1 2 A1 B2 2.0 0.666
2 3 A1 B1 0.5 0.333
3 4 A1 B2 1.0 0.666
4 5 A2 B3 2.0 0.25
5 6 A2 B4 6.0 0.75
The aim is to add a new column which gives the proportion of the area of each floor A that is accounted for by each room type B (note this is by room type so the value in the output column should be the same for each unique combination of A and B).
So far what I have is:
>>> grouped = df.groupby(['A','B'])
>>> area_proportion = lambda x: (x['Area'] / x['Area'].sum())
>>> grouped.transform(area_proportion)
But this seems to be treating the lambda as by index of the original dataframe (I thought it would be by group) as it just returns:
Out[142]:
ID Area
0 1 1.0
1 2 2.0
2 3 0.5
3 4 1.0
4 5 2.0
5 6 6.0
I'm obviously misunderstanding something or missing a vital part of the docs. How should I be using groupby to get the result I need?
Upvotes: 3
Views: 731
Reputation: 77951
Possibly:
>>> aggr = lambda df, key, col: df.groupby(key)[col].transform('sum')
>>> df['B as % of A'] = aggr(df, ('A', 'B'), 'Area') / aggr(df, 'A', 'Area')
>>> df
ID A B Area B as % of A
0 1 A1 B1 1.0 0.3333
1 2 A1 B2 2.0 0.6667
2 3 A1 B1 0.5 0.3333
3 4 A1 B2 1.0 0.6667
4 5 A2 B3 2.0 0.2500
5 6 A2 B4 6.0 0.7500
Upvotes: 3