Reputation: 569
df_a
and df_b
are two dataframes that looks like following. They do NOT have same length.
--> df_a
col_1 col_2
a b
c
d
e
f g
--> df_b
col_3
a
c
d
e
I know I can achieve a merge with AND clause with :
df_a = pd.Dataframe({"col_1": ["a", "c", "d", None, "f"], "col_2": ["b", None, None, "e", "g"]})
df_b = pd.Dataframe({"col_3": ["a", "c", "d", "e"]})
pd.merge(df_a, df_b, how="inner", left_on=["col_1", "col_2"], right_on=["col_3", "col_3"])
My need is to merged them based on a OR clause like col_1 == col_3 OR col_2 == col_3
. Ideally I do not want to make two merges separately.
Appreciate any suggestions.
Upvotes: 2
Views: 265
Reputation: 261974
Use a double left merge
, first with 'col_1', then with 'col_2' and combine_first
the two outputs:
(df_a
.merge(df_b, left_on='col_1', right_on='col_3', how='left')
.combine_first(df_a.merge(df_b, left_on='col_2', right_on='col_3', how='left'))
)
output:
col_1 col_2 col_3 col_4
0 a b a 0.0
1 c None c 1.0
2 d None d 2.0
3 None e e 3.0
4 f g NaN NaN
used input (with extra column):
df_a = pd.DataFrame({"col_1": ["a", "c", "d", None, "f"],
"col_2": ["b", None, None, "e", "g"]})
df_b = pd.DataFrame({"col_3": ["a", "c", "d", "e"], "col_4": range(4)})
you can use functools.reduce
and a list of columns in order of the preferred match (modified input to see the difference, here 'col_2' has priority over 'col_1'):
cols = ['col_2', 'col_1']
from functools import reduce
out = reduce(lambda a,b: a.combine_first(b),
[df_a.merge(df_b, left_on=c, right_on='col_3', how='left')
for c in cols])
output:
col_1 col_2 col_3 col_4
0 a c c 1.0
1 c None c 1.0
2 d None d 2.0
3 None e e 3.0
4 f g NaN NaN
modified input:
df_a = pd.DataFrame({"col_1": ["a", "c", "d", None, "f"],
"col_2": ["c", None, None, "e", "g"]})
df_b = pd.DataFrame({"col_3": ["a", "c", "d", "e"], "col_4": range(4)})
Upvotes: 2