azal
azal

Reputation: 1260

Weighted average in pandas DataFrame with specific condition

I want to dynamically create an exponentially decaying moving average that gives more weight to recent measurements. For instance, if I have 5 requests and those 5 requests are the result of the last 4 weeks, I want to create the exponential average dynamically using those 4 weeks (row 1). Nevertheless, if the those 5 requests are the result of 4 weeks but some of the weeks are present more than once, then I want somehow to modify the exponential average so as to not treat the duplicate weeks are separate ones and assign wrong weights. My measurements are in weeks. Example dataFrame:

id requests day_of_week hour   weeks 
1    5        3       21   [1,2,3,4] 
2    5        3.      22   [2,2,3,4]

Expected output:
id requests day_of_week hour   weeks   output   
1    0        3       21   [1,2,3,4]   see_function
2    5        3.      22   [2,2,3,4]   see_function

I am defining the weighted mean function as follows:
# lambda function to compute the weighted mean:
r = 0.5
a = 1.0
wm = lambda x: np.average(x, weights=[a * r ** i for i in range(len(x))].reverse())

df['output'] = df['weeks'].apply(wm, axis=1)

Nevertheless, what I'm doing is wrong as it treats every week (duplicated or not) exactly the same. I am trying to find a clever solution that can distinguish if weeks are duplicate and thus not allocate fictional weights.

The weighted average I have posted assumes an constant half life that only depends on the length of the measurements and does not take what I want into account. Assuming that the dict of weeks is {2: 2, 3:1, 4:1}, then I would somehow exploit the frequencies of appearance to tweak my weighted average to pay more attention to the recent ones than already does

Upvotes: 0

Views: 194

Answers (1)

Divyaansh Bajpai
Divyaansh Bajpai

Reputation: 232

If you want to get rid of duplicates from weeks list then you can do something like below and add new column in your dataframe and cal. you weighted avg. on top of it.

df = pd.DataFrame({'id':[1,4],'weeks':[[1,2,3,4],[2,2,3,4]]})
df['DistinctWeeks']=df['weeks'].apply(lambda x : list(set(x)))

Output:

enter image description here

Upvotes: 1

Related Questions