Reputation: 17
Hello Community,
in my dataframe i have three columns as follow:
restaurant_id, annee, moyenne_etoile_annee.
for each restaurant_id i would like to substract the "moyenne_etoiles_annee' of the last year minus the one from the first year.
By example for the first restaurant_id df[diff]=moyenne_etoiles_annee(2017)-moyenne_etoiles_annee(2015) 2-2.66=-0.66
I tried to .agg and select first and last value but wasn't able to execute properly.
Sample:
df = pd.DataFrame({"restaurant_id": [1,1,2,2],
"annee": [2015,2016,2019,2022],
"moyenne_etoile_annee": [1.2,1.4,1.3,1.3]})
Upvotes: 1
Views: 519
Reputation: 862591
Solution working well if unique years per restaurant_id
like in question data.
First use DataFrame.sort_values
per both columns (if necessary) for first and last annee
per restaurant_id
, then subtract first
and last
value per restaurant_id
in GroupBy.transform
:
df1 = df.sort_values(['restaurant_id','annee'])
g = df1.groupby('restaurant_id')['moyenne_etoile_annee']
df1['diff'] = g.transform('last').sub(g.transform('first'))
Alternative with lambda function:
df1 = df.sort_values(['restaurant_id','annee'])
g = df1.groupby('restaurant_id')['moyenne_etoile_annee']
df1['diff'] = g.transform(lambda x: x.iat[-1] - x.iat[0])
Upvotes: 1