Reputation: 1315
I am performing an outer join on two DataFrames:
df1 = pd.DataFrame({'id': [1, 2, 3, 4, 5],
'date': [4, 5, 6, 7, 8],
'str': ['a', 'b', 'c', 'd', 'e']})
df2 = pd.DataFrame({'id': [1, 2, 3, 4, 6],
'date': [4, 5, 6, 7, 8],
'str': ['A', 'B', 'C', 'D', 'Q']})
pd.merge(df1, df2, on=["id","date"], how="outer")
This gives the result
date id str_x str_y
0 4 1 a A
1 5 2 b B
2 6 3 c C
3 7 4 d D
4 8 5 e NaN
5 8 6 NaN Q
Is it possible to perform the outer join such that the str
-columns are concatenated? In other words, how to perform the join such that we get the DataFrame
date id str
0 4 1 aA
1 5 2 bB
2 6 3 cC
3 7 4 dD
4 8 5 e
5 8 6 Q
where all NaN
have been set to None
.
Upvotes: 2
Views: 50
Reputation: 294586
If 'id', 'date'
is unique in each data frame, then you can set the index and add the dataframes.
icols = ['date', 'id']
df1.set_index(icols).add(df2.set_index(icols), fill_value='').reset_index()
date id str
0 4 1 aA
1 5 2 bB
2 6 3 cC
3 7 4 dD
4 8 5 e
5 8 6 Q
Upvotes: 2
Reputation: 863801
I think not, possible solution is replace NaN
s and join together:
df = (pd.merge(df1, df2, on=["id","date"], how="outer", suffixes=('','_'))
.assign(str=lambda x: x['str'].fillna('') + x['str_'].fillna(''))
.drop('str_', 1))
Similar alternative:
df = (pd.merge(df1, df2, on=["id","date"], how="outer", suffixes=('','_'))
.assign(str=lambda x: x.filter(like='str').fillna('').values.sum(axis=1))
.drop('str_', 1))
print (df)
date id str
0 4 1 aA
1 5 2 bB
2 6 3 cC
3 7 4 dD
4 8 5 e
5 8 6 Q
Upvotes: 2