Reputation: 817
I have two dataframe that I would like to merge based on if column value from df2 contains column value from df1. I've been trying to use str.contains and series.isin. But no luck so far. Example below.
df1
Domain Visits
aaa 1
bbb 3
ddd 5
df2
Domain Ads
aaa.com 2
bbb 4
c.com 3
e.net 6
Merged Dataframe should be like this
mergeDF
Domain Visits Ads
aaa.com 1 2
bbb 3 4
c.com 3
ddd 5
e.net 6
Thx in advance.
Upvotes: 4
Views: 5334
Reputation: 51155
Setup
We can start by creating a new series which is the row each domain in df2
corresponds to in df1
, if such a row exists:
r = '({})'.format('|'.join(df1.Domain))
merge_df = df2.Domain.str.extract(r, expand=False).fillna(df2.Domain)
Our merge_df
now looks like this:
0 aaa
1 bbb
2 c.com
3 e.net
Name: Domain, dtype: object
Now we merge. We want an outer merge here, merging our first DataFrame on the Domain
column, and our second DataFrame on our merge_df
series:
df1.merge(df2.drop('Domain', 1), left_on='Domain', right_on=merge_df, how='outer')
Domain Visits Ads
0 aaa 1.0 2.0
1 bbb 3.0 4.0
2 ddd 5.0 NaN
3 c.com NaN 3.0
4 e.net NaN 6.0
Since we stored a separate series to merge on, and didn't modify df2
, we are done after this step, and you have your merged DataFrame.
Upvotes: 4
Reputation: 153460
Try and preprocess df2, to get domain in the same format as df1:
df2['key'] = df2['Domain'].str.extract('(\w+)\.?')
df_out = df1.merge(df2, left_on='Domain', right_on='key', suffixes=('_x',''), how='outer')
df_out['Domain'] = df_out['Domain'].combine_first(df_out['Domain_x'])
df_out = df_out.drop(['Domain_x','key'], axis=1)
print(df_out)
Output:
Visits Domain Ads
0 1.0 aaa.com 2.0
1 3.0 bbb 4.0
2 5.0 ddd NaN
3 NaN c.com 3.0
4 NaN e.net 6.0
Upvotes: 1