Reputation: 2657
Using a Pandas dataframe, is there a way to flatten the result of a groupby operation without having to use a temporary dataframe and then merge it to the original one?
Let's say I need to create a "result" column which depends on an aggregating operation, like in this scenario:
import pandas as pd
df = pd.DataFrame({'box': [1,1,1,2,2,3,3,3,3], 'item': ['apple', 'durian', 'pear', 'orange', 'banana', 'apple', 'pear', 'durian', 'orange']})
df['subindex'] = df.groupby('box').cumcount()+1
tmp_df = df.groupby('box')['subindex'].apply(lambda x: (1/x**2).sum()).reset_index(name='result')
df = pd.merge(df, tmp_df, how='inner', on='box')
Is there a way I can achieve the same thing in the last two lines in one line of code only, without having to merge two distinct dataframes?
Upvotes: 2
Views: 157
Reputation: 153500
As @YOBEN_S points out you can combine in to one statement, however the key to what you are looking for is transform
.
import pandas as pd
df = pd.DataFrame({'box': [1,1,1,2,2,3,3,3,3], 'item': ['apple', 'durian', 'pear', 'orange', 'banana', 'apple', 'pear', 'durian', 'orange']})
df['subindex'] = df.groupby('box').cumcount()+1
df['result'] = df.groupby('box')['subindex'].transform(lambda x: (1/x**2).sum())
print(df)
Output:
box item subindex result
0 1 apple 1 1.361111
1 1 durian 2 1.361111
2 1 pear 3 1.361111
3 2 orange 1 1.250000
4 2 banana 2 1.250000
5 3 apple 1 1.423611
6 3 pear 2 1.423611
7 3 durian 3 1.423611
8 3 orange 4 1.423611
transform
takes a column and applies as a function to that column and transforms those values into different values but it keeps the same shape and order of the original dataframe.
Upvotes: 1
Reputation: 323326
We can combine two steps into one
df['result']=(df.groupby('box').cumcount()+1).groupby(df['box']).\
transform(lambda x : (1/x**2).sum())
0 1.361111
1 1.361111
2 1.361111
3 1.250000
4 1.250000
5 1.423611
6 1.423611
7 1.423611
8 1.423611
dtype: float64
Upvotes: 3