Reputation: 29
I have calculated the difference between two months of values on grouped IDs in a dataframe, I now new to calculate the % difference but am receiving keyerrors or 'NaN' for all of my attempts using .pct_change().
My datafrane is constructed as below;
ID | MONTH | TOTAL | Variance |
---|---|---|---|
REF1 | 1 | 55 | 0 |
REF1 | 2 | 65 | 10 |
REF1 | 3 | 66 | 1 |
REF2 | 7 | 65 | -15 |
REF2 | 6 | 80 | 10 |
REF2 | 8 | 100 | 35 |
I now need to add another column depicting the % Variance between the previous month.
df_all = (
pd.concat(pd.read_excel("testdata.xlsx",
header=None, sheet_name=None), ignore_index=True)
.T.set_index(0).T
.rename_axis(None, axis=1)
.loc[:, ["ID", "DATE", "TOTAL"]]
.assign(MONTH= lambda x: x["DATE"].dt.month)
.groupby(by=["ID", "MONTH"], as_index=False).agg({"TOTAL":sum})
)
df_all['Variance'] = df_all['TOTAL'].sub(
df_all[['ID', 'MONTH']]
.merge(df_all.assign(MONTH=df_all['MONTH'].add(1)),
how='left')['TOTAL']
)
df_all['Variance'].fillna(0, inplace=True)
Upvotes: 0
Views: 46
Reputation: 24568
pct_chaange
works just fine for given sample data:
df['%'] = df.sort_values(['ID','MONTH']).groupby(['ID'])['TOTAL'].pct_change() * 100
output :
ID MONTH TOTAL Variance %
0 REF1 1 55 0 NaN
1 REF1 2 65 10 18.181818
2 REF1 3 66 1 1.538462
3 REF2 7 65 -15 -18.750000
4 REF2 6 80 10 NaN
5 REF2 8 100 35 53.846154
Upvotes: 2