Reputation: 229
I have two tables A) Contains "Entry number" and "Amount" B) Contains "From Entry number" and "To Entry Number" and "Created Date"
I would then like to join the two, but I don't have a specific join key as table A has specific "Entry number" - for instance "1000", but table B has intervals, for instance "From Entry number" "900" and "To Entry number" "1100". In SQL I would be able to join them as:
select * from table A
left join table B
on a.[Entry number] >= b.[From Entry number] and a.[Entry number] <= b.[To Entry number]
But how can I achieve that is Pandas?
Upvotes: 4
Views: 2885
Reputation: 3305
pandas merge_asof
(docs) essentially solves this.
timestamps_df_1 = pd.DataFrame({
'approx_time': pd.to_datetime(['2023-08-09 14:15:00', '2023-07-01 09:55:00', '2023-07-02 15:55:00']),
'event_id_1': [1, 2, 3],
})
timestamps_df_2 = pd.DataFrame({
'approx_time': pd.to_datetime(['2023-08-09 13:59:00', '2023-07-01 10:15:00', '2023-07-02 11:55:00']),
'event_id_2': [1, 2, 3]
})
pd.merge_asof(
timestamps_df_1.sort_values(by='approx_time'),
timestamps_df_2.sort_values(by='approx_time'),
on='approx_time',
tolerance=pd.Timedelta('2h'),
direction='nearest',
)
result
approx_time event_id_1 event_id_2
0 2023-07-01 09:55:00 2 2.0
1 2023-07-02 15:55:00 3 NaN
2 2023-08-09 14:15:00 1 1.0
Upvotes: 0
Reputation: 467
import pandas as pd
import sqldf
dfa = pd.DataFrame({"EntryNumber": [1000, 50],
"Amount":[4,2]})
dfb = pd.DataFrame({"FromEntryNumber":[900, 60],
"ToEntryNumber":[1100, 100],
"CreateDate":["2021-01-01", "2020-01-01"]})
sqldf.run(
"""
select *
from dfa
join dfb on dfa.EntryNumber >= dfb.FromEntryNumber and
dfa.EntryNumber<=dfb.ToEntryNumber
"""
)
Using sqldf will allow you to use SQL code in python.
Upvotes: 0
Reputation: 1897
Disclamer: This solution could be quite memory-heavy.
I'm using the following two dataframes:
import pandas as pd
dfa = pd.DataFrame({"Entry Number": [1000, 50],
"Amount":[4,2]})
dfb = pd.DataFrame({"From Entry Number":[900, 60],
"To Entry Number":[1100, 100],
"Create Date":["2021-01-01", "2020-01-01"]})
dfa:
Entry Number | Amount |
---|---|
1000 | 4 |
50 | 2 |
dfb:
From Entry Number | To Entry Number | Create Date |
---|---|---|
900 | 1100 | 2021-01-01 |
60 | 100 | 2020-01-01 |
First create a Cartesian (cross) Joined dataframe:
df = dfa.merge(dfb, how = 'cross')
This creates the following joined dataframe:
Entry Number | Amount | From Entry Number | To Entry Number | Create Date |
---|---|---|---|---|
1000 | 4 | 900 | 1100 | 2021-01-01 |
1000 | 4 | 60 | 100 | 2020-01-01 |
50 | 2 | 900 | 1100 | 2021-01-01 |
50 | 2 | 60 | 100 | 2020-01-01 |
Now you can filter the dataframe based on your conditions:
df[
(df["Entry Number"] >= df["From Entry Number"]) &
(df["Entry Number"] <= df["To Entry Number"])
]
End result:
Entry Number | Amount | From Entry Number | To Entry Number | Create Date |
---|---|---|---|---|
1000 | 4 | 900 | 1100 | 2021-01-01 |
Upvotes: 2