Reputation: 424
I would like to select date period in calendar for several years. So let's say from March 25th to August 25th every year. I was testing this approach: Filtering data for multiple years by date range given by months and days in pandas dataframe, however it is not working for me. Format of the date is datetime64[ns], so I am not quite sure what is wrong. I was testing different approaches and always end up working only with month selection without day limitations.
Here is my code snippet:
np.random.seed(0)
data = pd.DataFrame({'
date': pd.date_range('1990-01-01', freq='d', periods=10000),
'yield': np.random.randn(10000).cumsum(),
'simulation': np.random.randn(10000).cumsum(),
'Predicted': np.random.randn(10000).cumsum()})
out:
date yield simulation Predicted
0 1990-01-01 1.764 -0.202 0.330
1 1990-01-02 2.164 -1.035 0.330
2 1990-01-03 3.143 0.698 1.148
3 1990-01-04 5.384 0.889 1.576
4 1990-01-05 7.251 0.711 -0.928
...............................................
9995 2017-05-14 -186.162 111.432 -56.764
9996 2017-05-15 -186.119 111.323 -57.349
9997 2017-05-16 -185.602 111.266 -58.861
9998 2017-05-17 -185.635 110.207 -57.884
9999 2017-05-18 -184.337 109.880 -56.628
Upvotes: 0
Views: 1101
Reputation: 599
You can use query function:
data.query("date >= '1990-03-25' and date <= '1991-08-25'")
Output:
date yield simulation Predicted
83 1990-03-25 3.871510 1.124867 -0.641264
84 1990-03-26 3.463583 1.308551 -1.727582
85 1990-03-27 2.730999 0.265739 -2.836030
86 1990-03-28 2.689405 -0.602710 -4.748007
87 1990-03-29 2.678018 -3.826888 -4.159625
.. ... ... ... ...
597 1991-08-21 -1.645160 6.778537 -6.120725
598 1991-08-22 -2.304405 6.092513 -6.671160
599 1991-08-23 -3.955274 5.144826 -6.719812
600 1991-08-24 -3.554355 4.038627 -6.316459
601 1991-08-25 -4.692877 3.711540 -6.723403
Upvotes: 0
Reputation: 35626
Try creating a Boolean index for each month's dates, then or
ing them together
import numpy as np
import pandas as pd
np.random.seed(0)
data = pd.DataFrame({
'date': pd.date_range('1990-01-01', freq='d', periods=10000),
'yield': np.random.randn(10000).cumsum(),
'simulation': np.random.randn(10000).cumsum(),
'Predicted': np.random.randn(10000).cumsum()
})
march_dates = (
data['date'].dt.month.eq(3) &
data['date'].dt.day.between(25, 31)
)
april_dates = (
data['date'].dt.month.eq(4) &
data['date'].dt.day.between(1, 25)
)
m = march_dates | april_dates
filtered_df = data[m]
print(filtered_df)
filtered_df
:
date yield simulation Predicted
83 1990-03-25 -3.778287 11.467327 5.179931
84 1990-03-26 -2.290035 9.555657 7.388960
85 1990-03-27 -0.394146 10.490462 8.468163
86 1990-03-28 0.784634 11.336408 9.594991
87 1990-03-29 0.604709 11.454873 9.741437
... ... ... ... ...
9972 2017-04-21 -185.894256 103.212971 -62.513663
9973 2017-04-22 -186.199773 101.885143 -63.431743
9974 2017-04-23 -186.804921 101.908402 -63.195393
9975 2017-04-24 -186.580100 100.244993 -62.907841
9976 2017-04-25 -187.618573 100.814717 -62.071467
[896 rows x 4 columns]
Upvotes: 1
Reputation: 12305
Unfortunately you are not providing a reproducable example.
Upvotes: 0