Reputation: 4807
I have a datafrme df1
as:
Type StDt EnDt
A 1/2/2012 1/4/2012
B 1/6/2012 1/6/2012
I have another dataframe df2
with all dates until 2019 as:
KBWI
Date
2012-01-02 45.00
2012-01-03 32.00
2012-01-04 14.00
2012-01-05 26.00
2012-01-06 27.00
For each row in df1
, I need to use the date range StDt, EnDt to extract all rows from df2 and take its minimum to get the following:
Type StDt EnDt Minimum
A 1/2/2012 1/4/2012 14.00
B 1/6/2012 1/6/2012 27.00
I am not sure how to do it efficiently as dataframes are large.
Upvotes: 2
Views: 56
Reputation: 28709
An old question; pd.merge_asof
should work nicely here, especially as one of the dataframes just has two rows (end goal is a minimum):
(pd.merge_asof(df2, df1, left_index=True, right_on='StDt')
.query('Date <= EnDt')
.groupby('Type')
.min()
)
Out[204]:
KBWI StDt EnDt
Type
A 14.0 2012-01-02 2012-01-04
B 27.0 2012-01-06 2012-01-06
Upvotes: 0
Reputation: 402852
Perliminary preparation: conversion of all involved columns and indices to datetime
.
df[['StDt', 'EnDt']] = df[['StDt', 'EnDt']].apply(pd.to_datetime, errors='coerce')
df2.index = pd.to_datetime(df2.index, errors='coerce')
df
Type StDt EnDt
0 A 2012-01-02 2012-01-04
1 B 2012-01-06 2012-01-06
df2
KBWI
Date
2012-01-02 45.0
2012-01-03 32.0
2012-01-04 14.0
2012-01-05 26.0
2012-01-06 27.0
One simple method is to use pd.IntervalIndex
and groupby
to find the minimum:
idx = pd.IntervalIndex.from_arrays(df['StDt'], df['EnDt'], closed='both')
df['Minimum'] = df2['KBWI'].groupby(idx.get_indexer_non_unique(df2.index)).min()
df
Type StDt EnDt Minimum
0 A 2012-01-02 2012-01-04 14.0
1 B 2012-01-06 2012-01-06 27.0
This works assuming df
's index is also a RangeIndex
(numeric, monotonically increasing).
Upvotes: 2