Reputation: 1801
I am trying to merge two data frames where df1 column A contains substring of df2 column B.
for example
df1:
columnA Actual
AB Apple
CD Straw
EF Ele
df2:
ColumnB Values
UserAB 1
UserCD 2
UserEF 3
UserGH 4
UserIJ 5
I am looking for output like below, can't able to identify how can I get this done.
Output:
ColumnB Values ColumnA Actual
UserAB 1 AB Apple
UserCD 2 CD Straw
UserEF 3 EF Ele
UserGH 4 NaN NaN
UserIJ 5 NaN NaN
Upvotes: 0
Views: 423
Reputation: 122
You can use df.append
:
df1 = df1.append(pd.DataFrame([[float('nan'), float('nan')], [float('nan'), float('nan')]], columns = ["ColumnA", "Actual"])).reset_index(drop=True)
df2 = df2.merge(df1, how = "left", on = df2.index).drop("key_0", axis = 1)
output:
ColumnB Values ColumnA Actual
0 UserAB 1 AB Apple
1 UserCD 2 CD Straw
2 UserEF 3 EF Ele
3 UserGH 4 NaN NaN
4 UserIJ 5 NaN NaN
Upvotes: 0
Reputation: 75100
You can try with str.extract()
:
p='({})'.format('|'.join(df1.columnA))
d=df1.set_index('columnA')['Actual']
s=df2.ColumnB.str.extract(p,expand=False)
df2=df2.assign(ColumnA=s,Actual=s.map(d))
ColumnB Values ColumnA Actual
0 UserAB 1 AB Apple
1 UserCD 2 CD Straw
2 UserEF 3 EF Ele
3 UserGH 4 NaN NaN
4 UserIJ 5 NaN NaN
Upvotes: 1