SereneWizard
SereneWizard

Reputation: 166

Extract data between two dates each year

I have a time series of daily data from 2000 to 2015. What I want is another single time series which only contains data from each year between April 15 to June 15 (because that is the period relevant for my analysis). I have already written a code to do the same myself, which is given below:

import pandas as pd
df = pd.read_table(myfilename, delimiter=",", parse_dates=['Date'], na_values=-99)
dff = df[df['Date'].apply(lambda x: x.month>=4 and x.month<=6)]
dff = dff[dff['Date'].apply(lambda x: x.day>=15 if x.month==4 else True)]
dff = dff[dff['Date'].apply(lambda x: x.day<=15 if x.month==6 else True)]

I think this code is too much ineffecient as it has to carry out operation on the dataframe 3 times to get the desired subset.

I would like to know the following two things:

  1. Is there an inbuilt pandas function to achieve this?
  2. If not, is there a more efficient and better way to achieve this?

Upvotes: 0

Views: 1281

Answers (2)

Haleemur Ali
Haleemur Ali

Reputation: 28253

let the data frame look like this:

df = pd.DataFrame({'Date': pd.date_range('2000-01-01', periods=365*10, freq='D'),
                   'Value': np.random.random(365*10)})

create a series of dates with the year set to the same value

x = df.Date.apply(lambda x: pd.datetime(2000,x.month, x.day))

filter using this series to select from the dataframe

df.values[(x >= pd.datetime(2000,4,15)) & (x <= pd.datetime(2000,6,15))]

Upvotes: 2

HYRY
HYRY

Reputation: 97291

try this:

index = pd.date_range("2000/01/01", "2016/01/01")
s = index.to_series()
s[(s.dt.month * 100 + s.dt.day).between(415, 615)]

Upvotes: 3

Related Questions