Aris
Aris

Reputation: 17

sum rows value based on condition in python dataframe

df

Hello Community,

in my dataframe i have three columns as follow:

restaurant_id, annee, moyenne_etoile_annee.

for each restaurant_id i would like to substract the "moyenne_etoiles_annee' of the last year minus the one from the first year.

By example for the first restaurant_id df[diff]=moyenne_etoiles_annee(2017)-moyenne_etoiles_annee(2015) 2-2.66=-0.66

I tried to .agg and select first and last value but wasn't able to execute properly.

Sample:

df = pd.DataFrame({"restaurant_id": [1,1,2,2],
                   "annee": [2015,2016,2019,2022],
                   "moyenne_etoile_annee": [1.2,1.4,1.3,1.3]}) 

Upvotes: 1

Views: 519

Answers (1)

jezrael
jezrael

Reputation: 862591

Solution working well if unique years per restaurant_id like in question data.

First use DataFrame.sort_values per both columns (if necessary) for first and last annee per restaurant_id, then subtract first and last value per restaurant_id in GroupBy.transform:

df1 = df.sort_values(['restaurant_id','annee'])
g = df1.groupby('restaurant_id')['moyenne_etoile_annee']
df1['diff'] = g.transform('last').sub(g.transform('first'))

Alternative with lambda function:

df1 = df.sort_values(['restaurant_id','annee'])
g = df1.groupby('restaurant_id')['moyenne_etoile_annee']
df1['diff'] = g.transform(lambda x: x.iat[-1] - x.iat[0])

Upvotes: 1

Related Questions