Reputation: 1956
I'm trying to compare two dataframes and drop rows from the first dataframe that aren't between the dates in the second dataframe (or...selecting those rows that are between the dates in the 2nd dataframe). The selections should be inclusive. This might be really simple but its just not clicking for me right now.
Example data is below. For dataframe 1, this can be generated using daily data starting July 1 2018 and ending November 30 2018 with random numbers in the 'number' column. The ...
in the dataframe 1 are meant used to show skipping data but the data is there in the real dataframe.
Dataframe 1:
Number
Date
2018-07-01 15.2
2018-07-02 17.3
2018-07-03 19.5
2018-07-04 13.7
2018-07-05 19.1
...
2018-09-15 30.4
2018-09-16 25.7
2018-09-17 21.2
2018-09-18 19.7
2018-09-19 23.4
...
2018-11-01 30.8
2018-11-02 47.2
2018-11-03 25.3
2018-11-04 39.7
2018-11-05 43.8
Dataframe 2:
Change
Date
2018-07-02 Start
2018-07-04 End
2018-09-16 Start
2018-09-18 End
2018-11-02 Start
2018-11-04 End
With the example above, the output should be:
Number
Date
2018-07-02 17.3
2018-07-03 19.5
2018-07-04 13.7
2018-09-16 25.7
2018-09-17 21.2
2018-09-18 19.7
2018-11-02 47.2
2018-11-03 25.3
2018-11-04 39.7
Upvotes: 0
Views: 199
Reputation: 402483
You can build an IntervalIndex
from df2
's index and search in logarithmic time.
df2.index = pd.to_datetime(df2.index)
idx = pd.IntervalIndex.from_arrays(df2.index[df.Change == 'Start'],
df2.index[df.Change == 'End'],
closed='both')
df1[idx.get_indexer(pd.to_datetime(df1.index)) > -1]
Number
Date
2018-07-02 17.3
2018-07-03 19.5
2018-07-04 13.7
2018-09-16 25.7
2018-09-17 21.2
2018-09-18 19.7
2018-11-02 47.2
2018-11-03 25.3
2018-11-04 39.7
Upvotes: 2
Reputation: 30605
You can try this, I hope the Start and End comes one after the other and is sorted.
df3 = pd.concat([df[i:j] for i,j in zip(df2.loc[df2['Change']=='Start'].index, df2.loc[df2['Change']=='End'].index)]))
Number
Date
2018-07-02 17.3
2018-07-03 19.5
2018-07-04 13.7
2018-09-16 25.7
2018-09-17 21.2
2018-09-18 19.7
2018-11-02 47.2
2018-11-03 25.3
2018-11-04 39.7
Upvotes: 2