Extract data between two dates each year

Question

I have a time series of daily data from 2000 to 2015. What I want is another single time series which only contains data from each year between April 15 to June 15 (because that is the period relevant for my analysis). I have already written a code to do the same myself, which is given below:

import pandas as pd
df = pd.read_table(myfilename, delimiter=",", parse_dates=['Date'], na_values=-99)
dff = df[df['Date'].apply(lambda x: x.month>=4 and x.month<=6)]
dff = dff[dff['Date'].apply(lambda x: x.day>=15 if x.month==4 else True)]
dff = dff[dff['Date'].apply(lambda x: x.day<=15 if x.month==6 else True)]

I think this code is too much ineffecient as it has to carry out operation on the dataframe 3 times to get the desired subset.

I would like to know the following two things:

Is there an inbuilt pandas function to achieve this?
If not, is there a more efficient and better way to achieve this?

Haleemur Ali · Accepted Answer

let the data frame look like this:

df = pd.DataFrame({'Date': pd.date_range('2000-01-01', periods=365*10, freq='D'),
                   'Value': np.random.random(365*10)})

create a series of dates with the year set to the same value

x = df.Date.apply(lambda x: pd.datetime(2000,x.month, x.day))

filter using this series to select from the dataframe

df.values[(x >= pd.datetime(2000,4,15)) & (x <= pd.datetime(2000,6,15))]

Extract data between two dates each year

Answers (2)

Related Questions