How to calculate percent change of each month, from median of previous months

Question

I have data that looks like this:

Group  YearMonth  PageViews  Users
A        202001      100        10
A        202002      120        9
B        202002      150        12
A        202003       90        10
B        202003      120        15
C        202001      130        10

I want to find out the percentage difference of each new month from the median of the previous month's usage, under each group For example,

For first row ( each group might have different first date for example, group A has 202001 and groupb b has 202002 as first observation) it will be NA
For group A the median of the previous month
will be only 202001, so the percentage difference for pageviews for 202002 will be 20% and user would be -10% as compared to 202001
Similarly for 202003, the median of previous months would be 110 ( 100+120 by 2 = 110) median of 202001 and 202002 combined, so pageview median , and user is 9.5, so the percentage difference for 202003 will be -19% for pageviews and +5% for users and so on.

How can I find this using python? Any help would be appreciated. Thank you.

Ben.T · Accepted Answer

you can use the expanding method to get the median of all values before and shift the result to align it with the following YearMonth, do this per Group using groupby.

# get expanding median of the two columns and shift
median_prev = (
    df.sort_values('YearMonth')
      .groupby('Group')
      [['PageViews','Users']]
      .apply(lambda x: x.expanding().mean().shift())
)
print(median_prev.sort_index())
#    PageViews  Users
# 0        NaN    NaN
# 1      100.0   10.0
# 2        NaN    NaN
# 3      110.0    9.5
# 4      150.0   12.0
# 5        NaN    NaN

Then do the math of percentage difference as you want. I assume you want:

# create the two columns, no need of sort_index, 
# will do it automatically index and column alignment
df[[f'%change_{col}' for col in ['PageViews','Users']]] = \
    ((df[['PageViews','Users']]/median_prev-1)*100).round(1)

print(df)
   Group  YearMonth  PageViews  Users  %change_PageViews  %change_Users
0     A     202001        100     10                NaN            NaN
1     A     202002        120      9               20.0          -10.0
2     B     202002        150     12                NaN            NaN
3     A     202003         90     10              -18.2            5.3
4     B     202003        120     15              -20.0           25.0
5     C     202001        130     10                NaN            NaN

How to calculate percent change of each month, from median of previous months

Answers (2)

Related Questions