Reputation: 675
Fairly new to pandas so I'm struggling with this.
I have a larger DataFrame with records indexed by a MultiIndex containing a DatetimeIndex, and a smaller DataFrame indexed with start and end dates, both of DatetimeIndex also. Here's what they look like:
Larger DataFrame:
Data
PatId EntryDate Id
725 2005-01-03 1422 X
2005-01-04 1563 X
2005-01-05 1355 X
2005-01-06 118 X
2005-01-09 1400 X
And the smaller one containing the date ranges:
PatId
EntryDate ExitDate
2005-01-15 2005-04-15 22407
2005-01-30 2005-04-30 95938
2005-02-07 2005-05-07 116812
2005-02-18 2005-05-18 12163
2005-02-21 2005-05-21 22908
I'd like an elegant and efficient way to filter the larger DataFrame to only include those records that fall within the date ranges defined in the smaller DataFrame.
Upvotes: 2
Views: 76
Reputation: 862681
You can use:
EntryDate = df2.index.get_level_values('EntryDate')
ExitDate = df2.index.get_level_values('ExitDate')
idx = np.concatenate([pd.date_range(s, e) for s, e in zip(EntryDate, ExitDate)])
df = df1[df1.index.get_level_values('EntryDate').isin(np.unique(idx))]
Explanation:
MultiIndex
by get_level_values
date_range
s in loop and join togetherisin
with boolean indexing
only by unique
datetimesUpvotes: 1
Reputation: 3103
You can do a simple process like this:
pd.concat([df.loc[:, start:end] for start, end in zip(df2.EntryDate, df2.ExitDate)])
Explanation
Upvotes: 1