P Spence
P Spence

Reputation: 391

Merging two dataframes based on a date between two other dates without a common column

I have two dataframes that I need to merge based on whether or not a date value fits in between two other dates. Basically, I need to perform an outer join where B.event_date is between A.start_date and A.end_date. It seems that merge and join always assume a common column which in this case, I do not have.

    A                           B
    start_date  end_date        event_date  price
0   2017-03-27  2017-04-20  0   2017-01-20  100
1   2017-01-10  2017-02-01  1   2017-01-27  200

Result 
    start_date  end_date        event_date  price
0   2017-03-27  2017-04-20  
1   2017-01-10  2017-02-01      2017-01-20  100
2   2017-01-10  2017-02-01      2017-01-27  200

Upvotes: 26

Views: 28804

Answers (2)

sammywemmy
sammywemmy

Reputation: 28659

conditional_join from pyjanitor handles inequality joins efficiently, and also handles overlaps:

# pip install pyjanitor
import pandas as pd
import janitor

Reusing @scottboston's data :

df_B.conditional_join(
        df_A, 
        ('event_date', 'start_date', '>='), 
        ('event_date', 'end_date', '<='), 
        how = 'right'
    )

        left             right
  event_date  price start_date   end_date
0        NaT    NaN 2017-03-27 2017-04-20
1 2017-01-20  100.0 2017-01-10 2017-02-01
2 2017-01-27  200.0 2017-01-10 2017-02-01

Under the hood, it uses binary search, which is more efficient than a Cartesian join. Note that if the intervals do not overlap, pd.IntervalIndex is another efficient solution.

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

Create data and format to datetimes:

df_A = pd.DataFrame({'start_date':['2017-03-27','2017-01-10'],'end_date':['2017-04-20','2017-02-01']})
df_B = pd.DataFrame({'event_date':['2017-01-20','2017-01-27'],'price':[100,200]})

df_A['end_date'] = pd.to_datetime(df_A.end_date)
df_A['start_date'] = pd.to_datetime(df_A.start_date)
df_B['event_date'] = pd.to_datetime(df_B.event_date)

Create keys to do a cross join:

New in pandas 1.2.0+ how='cross' instead of assigning psuedo keys:

df_merge = df_A.merge(df_B, how='cross')

Else, with pandas < 1.2.0 use psuedo key to merge on 'key'

df_A = df_A.assign(key=1)
df_B = df_B.assign(key=1)
df_merge = pd.merge(df_A, df_B, on='key').drop('key',axis=1)

Filter out records that do not meet criteria of event dates between start and end dates:

df_merge = df_merge.query('event_date >= start_date and event_date <= end_date')

Join back to original date range table and drop key column

df_out = df_A.merge(df_merge, on=['start_date','end_date'], how='left').fillna('').drop('key', axis=1)

print(df_out)

Output:

              end_date           start_date           event_date price
0  2017-04-20 00:00:00  2017-03-27 00:00:00                           
1  2017-02-01 00:00:00  2017-01-10 00:00:00  2017-01-20 00:00:00   100
2  2017-02-01 00:00:00  2017-01-10 00:00:00  2017-01-27 00:00:00   200

Upvotes: 34

Related Questions