Reputation: 1260
I want to dynamically create an exponentially decaying moving average that gives more weight to recent measurements. For instance, if I have 5 requests and those 5 requests are the result of the last 4 weeks, I want to create the exponential average dynamically using those 4 weeks (row 1). Nevertheless, if the those 5 requests are the result of 4 weeks but some of the weeks are present more than once, then I want somehow to modify the exponential average so as to not treat the duplicate weeks are separate ones and assign wrong weights. My measurements are in weeks. Example dataFrame:
id requests day_of_week hour weeks
1 5 3 21 [1,2,3,4]
2 5 3. 22 [2,2,3,4]
Expected output:
id requests day_of_week hour weeks output
1 0 3 21 [1,2,3,4] see_function
2 5 3. 22 [2,2,3,4] see_function
I am defining the weighted mean function as follows:
# lambda function to compute the weighted mean:
r = 0.5
a = 1.0
wm = lambda x: np.average(x, weights=[a * r ** i for i in range(len(x))].reverse())
df['output'] = df['weeks'].apply(wm, axis=1)
Nevertheless, what I'm doing is wrong as it treats every week (duplicated or not) exactly the same. I am trying to find a clever solution that can distinguish if weeks are duplicate and thus not allocate fictional weights.
The weighted average I have posted assumes an constant half life that only depends on the length of the measurements and does not take what I want into account. Assuming that the dict of weeks is {2: 2, 3:1, 4:1}, then I would somehow exploit the frequencies of appearance to tweak my weighted average to pay more attention to the recent ones than already does
Upvotes: 0
Views: 194
Reputation: 232
If you want to get rid of duplicates from weeks list then you can do something like below and add new column in your dataframe and cal. you weighted avg. on top of it.
df = pd.DataFrame({'id':[1,4],'weeks':[[1,2,3,4],[2,2,3,4]]})
df['DistinctWeeks']=df['weeks'].apply(lambda x : list(set(x)))
Output:
Upvotes: 1