Luckasino
Luckasino

Reputation: 424

Select date period in calendar year for several years by Pandas

I would like to select date period in calendar for several years. So let's say from March 25th to August 25th every year. I was testing this approach: Filtering data for multiple years by date range given by months and days in pandas dataframe, however it is not working for me. Format of the date is datetime64[ns], so I am not quite sure what is wrong. I was testing different approaches and always end up working only with month selection without day limitations.

Here is my code snippet:

np.random.seed(0)
data = pd.DataFrame({'
    date': pd.date_range('1990-01-01', freq='d', periods=10000),
    'yield': np.random.randn(10000).cumsum(),
    'simulation': np.random.randn(10000).cumsum(),
    'Predicted': np.random.randn(10000).cumsum()})
out:
        date           yield    simulation  Predicted
0       1990-01-01     1.764     -0.202     0.330
1       1990-01-02     2.164     -1.035     0.330
2       1990-01-03     3.143     0.698      1.148
3       1990-01-04     5.384     0.889      1.576
4       1990-01-05     7.251     0.711     -0.928
...............................................
9995    2017-05-14  -186.162   111.432    -56.764
9996    2017-05-15  -186.119   111.323    -57.349
9997    2017-05-16  -185.602   111.266    -58.861
9998    2017-05-17  -185.635   110.207    -57.884
9999    2017-05-18  -184.337   109.880    -56.628

Upvotes: 0

Views: 1101

Answers (3)

top talent
top talent

Reputation: 599

You can use query function:

data.query("date >= '1990-03-25' and date <= '1991-08-25'")

Output:

          date     yield  simulation  Predicted
83  1990-03-25  3.871510    1.124867  -0.641264
84  1990-03-26  3.463583    1.308551  -1.727582
85  1990-03-27  2.730999    0.265739  -2.836030
86  1990-03-28  2.689405   -0.602710  -4.748007
87  1990-03-29  2.678018   -3.826888  -4.159625
..         ...       ...         ...        ...
597 1991-08-21 -1.645160    6.778537  -6.120725
598 1991-08-22 -2.304405    6.092513  -6.671160
599 1991-08-23 -3.955274    5.144826  -6.719812
600 1991-08-24 -3.554355    4.038627  -6.316459
601 1991-08-25 -4.692877    3.711540  -6.723403

Upvotes: 0

Henry Ecker
Henry Ecker

Reputation: 35626

Try creating a Boolean index for each month's dates, then oring them together

import numpy as np
import pandas as pd

np.random.seed(0)
data = pd.DataFrame({
    'date': pd.date_range('1990-01-01', freq='d', periods=10000),
    'yield': np.random.randn(10000).cumsum(),
    'simulation': np.random.randn(10000).cumsum(),
    'Predicted': np.random.randn(10000).cumsum()
})

march_dates = (
        data['date'].dt.month.eq(3) &
        data['date'].dt.day.between(25, 31)
)
april_dates = (
        data['date'].dt.month.eq(4) &
        data['date'].dt.day.between(1, 25)
)
m = march_dates | april_dates

filtered_df = data[m]
print(filtered_df)

filtered_df:

           date       yield  simulation  Predicted
83   1990-03-25   -3.778287   11.467327   5.179931
84   1990-03-26   -2.290035    9.555657   7.388960
85   1990-03-27   -0.394146   10.490462   8.468163
86   1990-03-28    0.784634   11.336408   9.594991
87   1990-03-29    0.604709   11.454873   9.741437
...         ...         ...         ...        ...
9972 2017-04-21 -185.894256  103.212971 -62.513663
9973 2017-04-22 -186.199773  101.885143 -63.431743
9974 2017-04-23 -186.804921  101.908402 -63.195393
9975 2017-04-24 -186.580100  100.244993 -62.907841
9976 2017-04-25 -187.618573  100.814717 -62.071467

[896 rows x 4 columns]

Upvotes: 1

Peter
Peter

Reputation: 12305

Unfortunately you are not providing a reproducable example.

  1. You should make the date column the DataFrame index since this is the whole idea with DataFrames.
  2. Then you can easily apply the .month() and .day() methods to the index and filter them to your needs.
  3. If you're not familar with DataFrame indices, read the docs first,
    e.g https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html

Upvotes: 0

Related Questions