user3556757
user3556757

Reputation: 3619

Counting how many date-indexed rows exist during rolling window

Example is I have a pandas dataframe that represents all the days where I did an exercise during 2016. The date is represented as a DateTime64[ns].

What I want to generate is a timeseries from 1Jan2016 to 30Dec2016 where the value on an arbitrary day (example 30Mar2016) equals how many days during the preceeding six days that I performed the exercise [24Mar to 30Mar] One row per each day exercised would appear in the original dataframe.

I can't figure out a nice pythonic/vectorized way to define the new dataframe to do this counting. The index of it would be 1Jan2016 to 31Dec2016, but I can't see how to nicely define the 'count' column. I really want to avoid some nasty looping/iteration structure.

Upvotes: 0

Views: 41

Answers (1)

Riley Hun
Riley Hun

Reputation: 2785

You need to reindex your dataframe to all the days in a year, so you can get all the days that you didn't perform the exercise. Then use rolling_sum.

month = [np.random.randint(1,12) for _ in range(0,100)]
day = [np.random.randint(1,28) for _ in range(0,100)]
date = [datetime(2016, x, y) for x,y in zip(month, day)]
date = sorted(set(date))
values = [1]*len(date)
df = pd.DataFrame({'exercise': values}, index=date) # generate random data

all_dates = pd.date_range("1/1/2016", "12/31/2016")
df = df.reindex(all_dates) # reindex to get missing days
result = pd.rolling_sum(df, window=6, min_periods=1)

Upvotes: 0

Related Questions