Javide
Javide

Reputation: 2657

How to flatten the result of a groupby operation in Pandas?

Using a Pandas dataframe, is there a way to flatten the result of a groupby operation without having to use a temporary dataframe and then merge it to the original one?

Let's say I need to create a "result" column which depends on an aggregating operation, like in this scenario:

import pandas as pd

df = pd.DataFrame({'box': [1,1,1,2,2,3,3,3,3], 'item': ['apple', 'durian', 'pear', 'orange', 'banana', 'apple', 'pear', 'durian', 'orange']})

df['subindex'] = df.groupby('box').cumcount()+1

tmp_df = df.groupby('box')['subindex'].apply(lambda x: (1/x**2).sum()).reset_index(name='result')

df = pd.merge(df, tmp_df, how='inner', on='box')

Is there a way I can achieve the same thing in the last two lines in one line of code only, without having to merge two distinct dataframes?

Upvotes: 2

Views: 157

Answers (2)

Scott Boston
Scott Boston

Reputation: 153500

As @YOBEN_S points out you can combine in to one statement, however the key to what you are looking for is transform.

import pandas as pd
df = pd.DataFrame({'box': [1,1,1,2,2,3,3,3,3], 'item': ['apple', 'durian', 'pear', 'orange', 'banana', 'apple', 'pear', 'durian', 'orange']})
df['subindex'] = df.groupby('box').cumcount()+1
df['result'] = df.groupby('box')['subindex'].transform(lambda x: (1/x**2).sum())
print(df)

Output:

   box    item  subindex    result
0    1   apple         1  1.361111
1    1  durian         2  1.361111
2    1    pear         3  1.361111
3    2  orange         1  1.250000
4    2  banana         2  1.250000
5    3   apple         1  1.423611
6    3    pear         2  1.423611
7    3  durian         3  1.423611
8    3  orange         4  1.423611

transform takes a column and applies as a function to that column and transforms those values into different values but it keeps the same shape and order of the original dataframe.

Upvotes: 1

BENY
BENY

Reputation: 323326

We can combine two steps into one

df['result']=(df.groupby('box').cumcount()+1).groupby(df['box']).\
                  transform(lambda x : (1/x**2).sum())
0    1.361111
1    1.361111
2    1.361111
3    1.250000
4    1.250000
5    1.423611
6    1.423611
7    1.423611
8    1.423611
dtype: float64

Upvotes: 3

Related Questions