DKM
DKM

Reputation: 1801

join in two pandas dataframe based on substring of one dataframe

I am trying to merge two data frames where df1 column A contains substring of df2 column B.

for example

df1:
columnA Actual
AB      Apple
CD      Straw
EF      Ele

df2:
ColumnB  Values
UserAB    1
UserCD    2
UserEF    3
UserGH    4 
UserIJ    5

I am looking for output like below, can't able to identify how can I get this done.

Output:
   ColumnB  Values  ColumnA  Actual
    UserAB    1      AB       Apple
    UserCD    2      CD       Straw
    UserEF    3      EF       Ele
    UserGH    4      NaN      NaN
    UserIJ    5      NaN      NaN

Upvotes: 0

Views: 423

Answers (2)

mk_
mk_

Reputation: 122

You can use df.append:

df1 = df1.append(pd.DataFrame([[float('nan'), float('nan')], [float('nan'), float('nan')]], columns = ["ColumnA", "Actual"])).reset_index(drop=True)
df2 = df2.merge(df1, how = "left", on = df2.index).drop("key_0", axis = 1)

output:

  ColumnB Values ColumnA Actual
0  UserAB      1      AB  Apple
1  UserCD      2      CD  Straw
2  UserEF      3      EF    Ele
3  UserGH      4     NaN    NaN
4  UserIJ      5     NaN    NaN

Upvotes: 0

anky
anky

Reputation: 75100

You can try with str.extract():

p='({})'.format('|'.join(df1.columnA))
d=df1.set_index('columnA')['Actual']
s=df2.ColumnB.str.extract(p,expand=False)

df2=df2.assign(ColumnA=s,Actual=s.map(d))

  ColumnB  Values ColumnA Actual
0  UserAB       1      AB  Apple
1  UserCD       2      CD  Straw
2  UserEF       3      EF    Ele
3  UserGH       4     NaN    NaN
4  UserIJ       5     NaN    NaN

Upvotes: 1

Related Questions