Reputation: 1387
I have data that looks like this:
Group YearMonth PageViews Users
A 202001 100 10
A 202002 120 9
B 202002 150 12
A 202003 90 10
B 202003 120 15
C 202001 130 10
I want to find out the percentage difference of each new month from the median of the previous month's usage, under each group For example,
How can I find this using python? Any help would be appreciated. Thank you.
Upvotes: 0
Views: 619
Reputation: 116
by controlling the group column, you may need to shift the pageviews.
df=df.sort_index(ascending=False)
df["PageViews_1"] = df.groupby("Group")['PageViews'].apply(lambda x: (x.shift(1)))
so that, at each row, you will have the next months records. in the end, you can simply calculate the mean as
df['mean']=(df["PageViews_1"]+df['PageViews'])/2
for the median, given that you will all the shifted values next to Group A, you can calculate for each row.
df['median']=df.median(axis=1)
Upvotes: 0
Reputation: 29635
you can use the expanding
method to get the median of all values before and shift
the result to align it with the following YearMonth, do this per Group using groupby
.
# get expanding median of the two columns and shift
median_prev = (
df.sort_values('YearMonth')
.groupby('Group')
[['PageViews','Users']]
.apply(lambda x: x.expanding().mean().shift())
)
print(median_prev.sort_index())
# PageViews Users
# 0 NaN NaN
# 1 100.0 10.0
# 2 NaN NaN
# 3 110.0 9.5
# 4 150.0 12.0
# 5 NaN NaN
Then do the math of percentage difference as you want. I assume you want:
# create the two columns, no need of sort_index,
# will do it automatically index and column alignment
df[[f'%change_{col}' for col in ['PageViews','Users']]] = \
((df[['PageViews','Users']]/median_prev-1)*100).round(1)
print(df)
Group YearMonth PageViews Users %change_PageViews %change_Users
0 A 202001 100 10 NaN NaN
1 A 202002 120 9 20.0 -10.0
2 B 202002 150 12 NaN NaN
3 A 202003 90 10 -18.2 5.3
4 B 202003 120 15 -20.0 25.0
5 C 202001 130 10 NaN NaN
Upvotes: 4