Ryan1234
Ryan1234

Reputation: 29

calculating % difference across grouped pandas dataframe

I have calculated the difference between two months of values on grouped IDs in a dataframe, I now new to calculate the % difference but am receiving keyerrors or 'NaN' for all of my attempts using .pct_change().

My datafrane is constructed as below;

ID MONTH TOTAL Variance
REF1 1 55 0
REF1 2 65 10
REF1 3 66 1
REF2 7 65 -15
REF2 6 80 10
REF2 8 100 35

I now need to add another column depicting the % Variance between the previous month.

    df_all = ( 
            pd.concat(pd.read_excel("testdata.xlsx",
                                    header=None, sheet_name=None), ignore_index=True)
                .T.set_index(0).T
                .rename_axis(None, axis=1)
                .loc[:, ["ID", "DATE", "TOTAL"]]
                .assign(MONTH= lambda x: x["DATE"].dt.month)
                .groupby(by=["ID", "MONTH"], as_index=False).agg({"TOTAL":sum})
         )

df_all['Variance'] = df_all['TOTAL'].sub(
               df_all[['ID', 'MONTH']]
               .merge(df_all.assign(MONTH=df_all['MONTH'].add(1)),
                      how='left')['TOTAL']
              )
df_all['Variance'].fillna(0, inplace=True)

Upvotes: 0

Views: 46

Answers (1)

eshirvana
eshirvana

Reputation: 24568

pct_chaange works just fine for given sample data:

df['%'] = df.sort_values(['ID','MONTH']).groupby(['ID'])['TOTAL'].pct_change() * 100

output :

     ID  MONTH  TOTAL  Variance          %
0  REF1      1     55         0        NaN
1  REF1      2     65        10  18.181818
2  REF1      3     66         1   1.538462
3  REF2      7     65       -15 -18.750000
4  REF2      6     80        10        NaN
5  REF2      8    100        35  53.846154

Upvotes: 2

Related Questions