Reputation: 468
My question is somewhat similar to this one: How to merge pandas on string contains?, but I need a different output and the problem itself is a bit more complex. So I have 2 dataframes similar to the ones below:
df1 = pd.DataFrame({'ref_name':['city-louisville','city-louisville','city-louisville', 'town-lexington','town-lexington','town-lexington'], 'un_name1':['CPU1','CPU2','GPU1','CPU1','CPU2','GPU1'], 'value1':[10,15,28,12,14,14]})
df2 = pd.DataFrame({'ref_name':['louisville','louisville','lexington','lexington'], 'un_name2':['CPU','GPU','CPU','GPU'], 'value2':[25,28,26,14]})
I need to join based on ref_name
and un_name
based on the substrings within them. They won't always be as clean as this, but I figured it made for a decent little example. So my desired output in this case would look something like this:
ref_name | un_name1 | un_name2 | value1 | value2
---------------------------------------------------------
louisville| CPU1 | CPU | 10 | 25
louisville| CPU2 | CPU | 15 | 25
louisville| GPU1 | GPU | 28 | 28
lexington | CPU1 | CPU | 12 | 26
lexington | CPU2 | CPU | 14 | 26
lexington | GPU1 | GPU | 14 | 14
Thanks in advance for any help on this!
Upvotes: 2
Views: 380
Reputation: 93141
Here's the most generic version that I can think of. Performance may be a problem if your dataframes are big.
mask1 = df2['ref_name'].apply(lambda value: df1['ref_name'].str.contains(value))
mask2 = df2['un_name2'].apply(lambda value: df1['un_name1'].str.contains(value))
mask = (mask1 & mask2).stack().rename_axis(['index2', 'index1'])
mask = mask[mask].index.to_frame(False)
result = mask.merge(df2, left_on='index2', right_index=True) \
.merge(df1, left_on='index1', right_index=True)
Result:
index2 index1 ref_name_x un_name2 value2 ref_name_y un_name1 value1
0 0 louisville CPU 25 city-louisville CPU1 10
0 1 louisville CPU 25 city-louisville CPU2 15
1 2 louisville GPU 28 city-louisville GPU1 28
2 3 lexington CPU 26 town-lexington CPU1 12
2 4 lexington CPU 26 town-lexington CPU2 14
3 5 lexington GPU 14 town-lexington GPU1 14
Trimming / renaming the columns is an excercise left for the OP.
Upvotes: 2