chasedcribbet
chasedcribbet

Reputation: 258

Timeseries sum of values divided by number of business days

Trying to get an understanding how to "standardize" time series data.

Below is example df:

string = np.random.choice(np.array([x for x in 'abcde']*100))

visits = np.random.poisson(1000, size=500)

date = pd.Series(pd.date_range("2020", freq="D", periods=500))

df = pd.DataFrame({'string': string, 'visits': visits, 'date': date})

I would like to be able to add a column for any time range selected. Two examples would be:

df[Avg_visits_month] = sum of visits in each month/ total US business days in that month

df[Avg_visits_quarter]=sum of visits in each quarter/total US business days in that quarter

Attempts have been made using resample (cut off days in first month), groupby(had trouble getting count of bus_days), transform(could get count of num visits per time period, but not the bus day count), numpy.busday_count. I just couldn't get there.

Upvotes: 0

Views: 50

Answers (2)

Frank Burdell
Frank Burdell

Reputation: 23

More features will get you through this.

df['day_of_week'] = df.date.weekday() # will return 1 - 7

This makes

df['avg_visits_month'] = len(df) / len(df[(df.day_of_week > 1) & (df.day_of_week < 6)])

Note: I'm assuming each record is a visit in this instance which I think is implied by your initial structure, but unsure.

As a semantic point, I do not think I would call this 'standardization' as much as feature development. Perhaps this is a over-focus on normalization concepts on my part, though.

There are a couple of other ways to do this, as you have described, but this is one way.

Upvotes: 0

Scott Boston
Scott Boston

Reputation: 153510

IIUC, let's use pd.Period and pd.bdate_range:

df['avg_visits_month'] = df['visits'] / df['date'].dt.to_period('M').apply(lambda x: len(pd.bdate_range(x.start_time, x.end_time)))
df['avg_visits_quarter'] = df['visits'] / df['date'].dt.to_period('Q').apply(lambda x: len(pd.bdate_range(x.start_time, x.end_time)))
print(df)

Output:

    string  visits       date  avg_visits_month  avg_visits_quarter
0        b     997 2020-01-01         43.347826           15.338462
1        b     967 2020-01-02         42.043478           14.876923
2        b     982 2020-01-03         42.695652           15.107692
3        b     978 2020-01-04         42.521739           15.046154
4        b     986 2020-01-05         42.869565           15.169231
..     ...     ...        ...               ...                 ...
495      b    1018 2021-05-10         48.476190           15.661538
496      b     970 2021-05-11         46.190476           14.923077
497      b    1042 2021-05-12         49.619048           16.030769
498      b     978 2021-05-13         46.571429           15.046154
499      b    1058 2021-05-14         50.380952           16.276923

Upvotes: 2

Related Questions