prof32
prof32

Reputation: 157

Compute weekly averages of columns in time series data

I have time series data with an id column and some continues value column. I want to calculate the moving weekly averages of this value per each person, in a new column. Code generate a sample dataset:

import pandas as pd
import numpy as np
df = pd.DataFrame(index=pd.date_range(freq=f'{60}T',start='2020-01-01',periods=(1)*24*14))
df['col'] = np.random.random_integers(0, 250, size= df.shape[0])
df['uid'] = 1
df2 = pd.DataFrame(index=pd.date_range(freq=f'{60}T',start='2020-01-01',periods=(1)*24*14))
df2['col'] = np.random.random_integers(0, 150, size= df2.shape[0])
df2['uid'] = 2
df3=pd.concat([df, df2]).reset_index()
df3

This sample has 2 weeks of data per person, so there should be 2 average values per person. The first one will be the average of the first week, the second average will be the average of the two weeks (week1 average + week2 average)/2. Then, fill all the rows in the column with the average value of that week.

Real dataset is large, so I am looking for a solution that can be scaled. How can this be achieved?

desired outcome should look like this:

          index      uid   col    week_average
2020-01-01 00:00:00    1   104    week1_uid1_mean
2020-01-01 01:00:00    1   150    week1_uid1_mean
2020-01-01 02:00:00    1   243    week1_uid1_mean 
....
2020-01-08 00:00:00    1   174    (week1_uid1_mean+week2_uid1_mean)/2
2020-01-08 01:00:00    1   24     (week1_uid1_mean+week2_uid1_mean)/2
... 

Upvotes: 1

Views: 150

Answers (1)

Lewan
Lewan

Reputation: 98

First, compute the week index for each row

df3["week"] = (
    df3["index"] - df3.groupby("uid")["index"].transform("min")
) // pd.Timedelta(7, unit="day")

Or if the values in the index column are identical for all persons (uid), directly

df3["week"] = (df3["index"] - df3["index"].min()) // pd.Timedelta(
    7, unit="day"
)

Then, compute the week average for each distinct couple of (uid, week)

week_averages = (
    df3.groupby(["uid", "week"])["col"]
    .mean()
    .groupby("uid")
    .apply(lambda x: x.rolling(len(x), min_periods=1).mean())
)

Finally, fill each row of your dataframe with the corresponding week average

df3["week_average"] = df3.apply(
    lambda x: week_averages.loc[(x["uid"], x["week"])], axis=1
)

On your data, when using %timeit, I get

39.5 ms ± 335 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 1

Related Questions