Reputation: 166
I have a time series of daily data from 2000 to 2015. What I want is another single time series which only contains data from each year between April 15 to June 15 (because that is the period relevant for my analysis). I have already written a code to do the same myself, which is given below:
import pandas as pd
df = pd.read_table(myfilename, delimiter=",", parse_dates=['Date'], na_values=-99)
dff = df[df['Date'].apply(lambda x: x.month>=4 and x.month<=6)]
dff = dff[dff['Date'].apply(lambda x: x.day>=15 if x.month==4 else True)]
dff = dff[dff['Date'].apply(lambda x: x.day<=15 if x.month==6 else True)]
I think this code is too much ineffecient as it has to carry out operation on the dataframe 3 times to get the desired subset.
I would like to know the following two things:
Upvotes: 0
Views: 1281
Reputation: 28253
let the data frame look like this:
df = pd.DataFrame({'Date': pd.date_range('2000-01-01', periods=365*10, freq='D'),
'Value': np.random.random(365*10)})
create a series of dates with the year set to the same value
x = df.Date.apply(lambda x: pd.datetime(2000,x.month, x.day))
filter using this series to select from the dataframe
df.values[(x >= pd.datetime(2000,4,15)) & (x <= pd.datetime(2000,6,15))]
Upvotes: 2
Reputation: 97291
try this:
index = pd.date_range("2000/01/01", "2016/01/01")
s = index.to_series()
s[(s.dt.month * 100 + s.dt.day).between(415, 615)]
Upvotes: 3