Eric D. Brown D.Sc.
Eric D. Brown D.Sc.

Reputation: 1956

Select rows that lie within datetime intervals

I'm trying to compare two dataframes and drop rows from the first dataframe that aren't between the dates in the second dataframe (or...selecting those rows that are between the dates in the 2nd dataframe). The selections should be inclusive. This might be really simple but its just not clicking for me right now.

Example data is below. For dataframe 1, this can be generated using daily data starting July 1 2018 and ending November 30 2018 with random numbers in the 'number' column. The ... in the dataframe 1 are meant used to show skipping data but the data is there in the real dataframe.

Dataframe 1:

               Number
Date
2018-07-01     15.2
2018-07-02     17.3
2018-07-03     19.5
2018-07-04     13.7
2018-07-05     19.1
...
2018-09-15     30.4
2018-09-16     25.7
2018-09-17     21.2
2018-09-18     19.7
2018-09-19     23.4
...
2018-11-01     30.8
2018-11-02     47.2
2018-11-03     25.3
2018-11-04     39.7
2018-11-05     43.8

Dataframe 2:

              Change
Date
2018-07-02     Start
2018-07-04     End
2018-09-16     Start
2018-09-18     End
2018-11-02     Start
2018-11-04     End

With the example above, the output should be:

               Number
Date
2018-07-02     17.3
2018-07-03     19.5
2018-07-04     13.7
2018-09-16     25.7
2018-09-17     21.2
2018-09-18     19.7
2018-11-02     47.2
2018-11-03     25.3
2018-11-04     39.7

Upvotes: 0

Views: 199

Answers (2)

cs95
cs95

Reputation: 402483

You can build an IntervalIndex from df2's index and search in logarithmic time.

df2.index = pd.to_datetime(df2.index)
idx = pd.IntervalIndex.from_arrays(df2.index[df.Change == 'Start'], 
                                   df2.index[df.Change == 'End'],
                                   closed='both')

df1[idx.get_indexer(pd.to_datetime(df1.index)) > -1]

            Number
Date              
2018-07-02    17.3
2018-07-03    19.5
2018-07-04    13.7
2018-09-16    25.7
2018-09-17    21.2
2018-09-18    19.7
2018-11-02    47.2
2018-11-03    25.3
2018-11-04    39.7

Upvotes: 2

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

You can try this, I hope the Start and End comes one after the other and is sorted.

df3 = pd.concat([df[i:j] for i,j in zip(df2.loc[df2['Change']=='Start'].index, df2.loc[df2['Change']=='End'].index)]))
             Number
Date              
2018-07-02    17.3
2018-07-03    19.5
2018-07-04    13.7
2018-09-16    25.7
2018-09-17    21.2
2018-09-18    19.7
2018-11-02    47.2
2018-11-03    25.3
2018-11-04    39.7

Upvotes: 2

Related Questions