Subset Pandas DataFrame based on annual returning period covering multiple months

Question

This question is similar to Selecting Pandas DataFrame records for many years based on month & day range, but both the question and answer doesn't seem to cover my case

import pandas as pd
import numpy as np

rng = pd.date_range('2010-1-1', periods=1000, freq='D')
df = pd.DataFrame(np.random.randn(len(rng)), index=rng, columns=['A'])
df.head()

                   A
2010-01-01  1.098302
2010-01-02 -1.384821
2010-01-03 -0.426329
2010-01-04 -0.587967
2010-01-05 -0.853374

Now I would like to subset my DataFrame based on an annual returning period for every year. A period can for example be defined as from February 15th to October 3rd

startMM, startdd = (2,15)
endMM, enddd = (10,3)

Now I tried to to slice my multi-year DataFrame based on this period:

subset = df[((df.index.month == startMM) & (startdd <= df.index.day) 
             | (df.index.month == endMM) & (df.index.day <= enddd))]

but this returns only the months as is defined in the startMM and endMM and not the actual period between the dates. Any help is kindly appreciated.

subset.index.month.unique()

Int64Index([2, 10], dtype='int64')

IanS · Accepted Answer

I would create a column of (month, day) tuples:

month_day = pd.concat([
                df.index.to_series().dt.month, 
                df.index.to_series().dt.day
            ], axis=1).apply(tuple, axis=1)

You can then compare them directly:

df[(month_day >= (startMM, startdd)) & (month_day <= (endMM, enddd))]

Subset Pandas DataFrame based on annual returning period covering multiple months

Answers (2)

Related Questions