spacemud
spacemud

Reputation: 675

Filter a larger DataFrame using date ranges from a smaller one

Fairly new to pandas so I'm struggling with this.

I have a larger DataFrame with records indexed by a MultiIndex containing a DatetimeIndex, and a smaller DataFrame indexed with start and end dates, both of DatetimeIndex also. Here's what they look like:

Larger DataFrame:

                       Data
PatId EntryDate  Id                                        
725   2005-01-03 1422  X
      2005-01-04 1563  X
      2005-01-05 1355  X
      2005-01-06 118   X
      2005-01-09 1400  X

And the smaller one containing the date ranges:

                         PatId
EntryDate  ExitDate          
2005-01-15 2005-04-15   22407
2005-01-30 2005-04-30   95938
2005-02-07 2005-05-07  116812
2005-02-18 2005-05-18   12163
2005-02-21 2005-05-21   22908

I'd like an elegant and efficient way to filter the larger DataFrame to only include those records that fall within the date ranges defined in the smaller DataFrame.

Upvotes: 2

Views: 76

Answers (2)

jezrael
jezrael

Reputation: 862681

You can use:

EntryDate = df2.index.get_level_values('EntryDate')
ExitDate = df2.index.get_level_values('ExitDate')

idx = np.concatenate([pd.date_range(s, e) for s, e in zip(EntryDate, ExitDate)])
df = df1[df1.index.get_level_values('EntryDate').isin(np.unique(idx))]

Explanation:

  1. First get values of MultiIndex by get_level_values
  2. Create date_ranges in loop and join together
  3. Last filter by isin with boolean indexing only by unique datetimes

Upvotes: 1

iDrwish
iDrwish

Reputation: 3103

You can do a simple process like this:

pd.concat([df.loc[:, start:end] for start, end in zip(df2.EntryDate, df2.ExitDate)])

Explanation

  • DataFrames allow for slicing using datetime or a string format parsable as datetime
  • You need to filter the dataframe into smaller parts and then concatenate it
  • The rest is a simply list comprehension

Upvotes: 1

Related Questions