Reputation: 630
To obtain results executed on groupby-data with the same level of detail as the original DataFrame (same observation count) I have used the transform function.
Example: Original dataframe
name, year, grade
Jack, 2010, 6
Jack, 2011, 7
Rosie, 2010, 7
Rosie, 2011, 8
After groupby transform
name, year, grade, average grade
Jack, 2010, 6, 6.5
Jack, 2011, 7, 6.5
Rosie, 2010, 7, 7.5
Rosie, 2011, 8, 7.5
However, with more advanced functions based on multiple columns things get more complicated. What puzzles me is that I seem to be unable to access multiple columns in a groupby-transform combination.
df = pd.DataFrame({'a':[1,2,3,4,5,6],
'b':[1,2,3,4,5,6],
'c':['q', 'q', 'q', 'q', 'w', 'w'],
'd':['z','z','z','o','o','o']})
def f(x):
y=sum(x['a'])+sum(x['b'])
return(y)
df['e'] = df.groupby(['c','d']).transform(f)
Gives me:
KeyError: ('a', 'occurred at index a')
Though I know that following does work:
df.groupby(['c','d']).apply(f)
What causes this behavior and how can I obtain something like this:
a b c d e
1 1 q z 12
2 2 q z 12
3 3 q z 12
4 4 q o 8
5 5 w o 22
6 6 w o 22
Upvotes: 11
Views: 21202
Reputation: 28233
for this particular case you could do:
g = df.groupby(['c', 'd'])
df['e'] = g.a.transform('sum') + g.b.transform('sum')
df
# outputs
a b c d e
0 1 1 q z 12
1 2 2 q z 12
2 3 3 q z 12
3 4 4 q o 8
4 5 5 w o 22
5 6 6 w o 22
if you can construct the final result by a linear combination of the independent transforms on the same groupby, this method would work.
otherwise, you'd use a groupby-apply
and then merge back to the original df.
example:
_ = df.groupby(['c','d']).apply(lambda x: sum(x.a+x.b)).rename('e').reset_index()
df.merge(_, on=['c','d'])
# same output as above.
Upvotes: 14
Reputation: 164613
You can use GroupBy
+ transform
with sum
twice:
df['e'] = df.groupby(['c', 'd'])[['a', 'b']].transform('sum').sum(1)
print(df)
a b c d e
0 1 1 q z 12
1 2 2 q z 12
2 3 3 q z 12
3 4 4 q o 8
4 5 5 w o 22
5 6 6 w o 22
Upvotes: 3