Kshitij Yadav
Kshitij Yadav

Reputation: 1387

How to calculate percent change of each month, from median of previous months

I have data that looks like this:

Group  YearMonth  PageViews  Users
A        202001      100        10
A        202002      120        9
B        202002      150        12
A        202003       90        10
B        202003      120        15
C        202001      130        10

I want to find out the percentage difference of each new month from the median of the previous month's usage, under each group For example,

  1. For first row ( each group might have different first date for example, group A has 202001 and groupb b has 202002 as first observation) it will be NA
  2. For group A the median of the previous month
    will be only 202001, so the percentage difference for pageviews for 202002 will be 20% and user would be -10% as compared to 202001
  3. Similarly for 202003, the median of previous months would be 110 ( 100+120 by 2 = 110) median of 202001 and 202002 combined, so pageview median , and user is 9.5, so the percentage difference for 202003 will be -19% for pageviews and +5% for users and so on.

How can I find this using python? Any help would be appreciated. Thank you.

Upvotes: 0

Views: 619

Answers (2)

hilo
hilo

Reputation: 116

by controlling the group column, you may need to shift the pageviews.

  1. convert the data to descending, so the last one will be the first one
df=df.sort_index(ascending=False)
  1. shift the views by controlling the group
df["PageViews_1"] = df.groupby("Group")['PageViews'].apply(lambda x: (x.shift(1)))

so that, at each row, you will have the next months records. in the end, you can simply calculate the mean as

 df['mean']=(df["PageViews_1"]+df['PageViews'])/2

for the median, given that you will all the shifted values next to Group A, you can calculate for each row.

 df['median']=df.median(axis=1)

Upvotes: 0

Ben.T
Ben.T

Reputation: 29635

you can use the expanding method to get the median of all values before and shift the result to align it with the following YearMonth, do this per Group using groupby.

# get expanding median of the two columns and shift
median_prev = (
    df.sort_values('YearMonth')
      .groupby('Group')
      [['PageViews','Users']]
      .apply(lambda x: x.expanding().mean().shift())
)
print(median_prev.sort_index())
#    PageViews  Users
# 0        NaN    NaN
# 1      100.0   10.0
# 2        NaN    NaN
# 3      110.0    9.5
# 4      150.0   12.0
# 5        NaN    NaN

Then do the math of percentage difference as you want. I assume you want:

# create the two columns, no need of sort_index, 
# will do it automatically index and column alignment
df[[f'%change_{col}' for col in ['PageViews','Users']]] = \
    ((df[['PageViews','Users']]/median_prev-1)*100).round(1)

print(df)
   Group  YearMonth  PageViews  Users  %change_PageViews  %change_Users
0     A     202001        100     10                NaN            NaN
1     A     202002        120      9               20.0          -10.0
2     B     202002        150     12                NaN            NaN
3     A     202003         90     10              -18.2            5.3
4     B     202003        120     15              -20.0           25.0
5     C     202001        130     10                NaN            NaN

Upvotes: 4

Related Questions