Houssem Gharbi
Houssem Gharbi

Reputation: 33

Pandas merging with condition on columns

Greeting all,

Does anybody know how to join two dataframes according specific behavior using pandas please using pandas no other libraries.

like df1 inner join df2 where df1.t < df2.t ..

Upvotes: 1

Views: 151

Answers (1)

Fakher Mokadem
Fakher Mokadem

Reputation: 1099

Do it in sql, it comes with Python's standard library.

from sqlite3 import sqlite3
import pandas

# define your dataframes here 
df1 = ...
df2 = ...

# load the dataframes to memory
sql_ptr = sqlite3.connect(':memory:')
df1.to_sql('df1', sql_ptr)
df2.to_sql('df2', sql_ptr)

# execute the query
df3 = pd.read_sql_query("select * from df1 inner join df2 on <insert columns to join on> where df1.ts < df2.ts", sql_ptr)

And please keep in mind that this query will do two different steps:

  1. An inner join ==> an intersection on the specified columns
  2. filter the result table on the given condition (! given on other columns !)

An inner join in relational algebra is an intersection between two sets ==> There is no such thing as inner join on condition between columns (other than the implied equality condition)

e.g. this query "select * from df1 inner join df2 on df1.ts = df2.ts where df1.ts < df2.ts" will yield an empty view cause inner join will find an empty intersection between the tables df1 and df2.

Upvotes: 1

Related Questions