Morten_DK
Morten_DK

Reputation: 229

Pandas join by greater than or less than

I have two tables A) Contains "Entry number" and "Amount" B) Contains "From Entry number" and "To Entry Number" and "Created Date"

I would then like to join the two, but I don't have a specific join key as table A has specific "Entry number" - for instance "1000", but table B has intervals, for instance "From Entry number" "900" and "To Entry number" "1100". In SQL I would be able to join them as:

select * from table A 
left join table B
on a.[Entry number] >= b.[From Entry number] and a.[Entry number] <= b.[To Entry number]

But how can I achieve that is Pandas?

Upvotes: 4

Views: 2885

Answers (3)

Ufos
Ufos

Reputation: 3305

pandas merge_asof (docs) essentially solves this.


Example:

timestamps_df_1 = pd.DataFrame({
    'approx_time': pd.to_datetime(['2023-08-09 14:15:00', '2023-07-01 09:55:00', '2023-07-02 15:55:00']),
    'event_id_1': [1, 2, 3],
})

timestamps_df_2 = pd.DataFrame({
    'approx_time': pd.to_datetime(['2023-08-09 13:59:00', '2023-07-01 10:15:00', '2023-07-02 11:55:00']),
    'event_id_2': [1, 2, 3]
})

pd.merge_asof(
    timestamps_df_1.sort_values(by='approx_time'),
    timestamps_df_2.sort_values(by='approx_time'),
    on='approx_time',
    tolerance=pd.Timedelta('2h'),
    direction='nearest',
)

result

          approx_time  event_id_1  event_id_2
0 2023-07-01 09:55:00           2         2.0
1 2023-07-02 15:55:00           3         NaN
2 2023-08-09 14:15:00           1         1.0

Upvotes: 0

Jorge Lopez
Jorge Lopez

Reputation: 467

 import pandas as pd
 import sqldf 
    
    dfa = pd.DataFrame({"EntryNumber": [1000, 50],
                        "Amount":[4,2]})
    dfb = pd.DataFrame({"FromEntryNumber":[900, 60],
                        "ToEntryNumber":[1100, 100],
                        "CreateDate":["2021-01-01", "2020-01-01"]})
    
    sqldf.run(
        """
            select * 
            from dfa  
            join dfb on dfa.EntryNumber >= dfb.FromEntryNumber and 
            dfa.EntryNumber<=dfb.ToEntryNumber
        """
    )

Using sqldf will allow you to use SQL code in python.

Upvotes: 0

Paul
Paul

Reputation: 1897

Disclamer: This solution could be quite memory-heavy.

I'm using the following two dataframes:

import pandas as pd
dfa = pd.DataFrame({"Entry Number": [1000, 50],
                    "Amount":[4,2]})
dfb = pd.DataFrame({"From Entry Number":[900, 60],
                    "To Entry Number":[1100, 100],
                    "Create Date":["2021-01-01", "2020-01-01"]})

dfa:

Entry Number Amount
1000 4
50 2

dfb:

From Entry Number To Entry Number Create Date
900 1100 2021-01-01
60 100 2020-01-01

First create a Cartesian (cross) Joined dataframe:

df = dfa.merge(dfb, how = 'cross')

This creates the following joined dataframe:

Entry Number Amount From Entry Number To Entry Number Create Date
1000 4 900 1100 2021-01-01
1000 4 60 100 2020-01-01
50 2 900 1100 2021-01-01
50 2 60 100 2020-01-01

Now you can filter the dataframe based on your conditions:

df[
    (df["Entry Number"] >= df["From Entry Number"]) &
    (df["Entry Number"] <= df["To Entry Number"])
  ]

End result:

Entry Number Amount From Entry Number To Entry Number Create Date
1000 4 900 1100 2021-01-01

Upvotes: 2

Related Questions