Reputation: 325
I have been asking this question quite few times and it seems that no one can answer it...
I am looking for a loop/fuction or a simple code that can look through 2 columns in different dataframes and output a third column. This example is quite different from a simple merge or a merge where we have one string and one substring... in this example we have 2 substrings to compare and output a third column if one of the key stored in the substring line is present in in the other substring line of the diffrent dataframe.
This is the example:
data = [['Alex','11111111 20'],['Bob','2222222 0000'],['Clarke','33333 999999']]
df = pd.DataFrame(data,columns=['Name','Code'])
df
data = [['Reed','0000 88'],['Ros',np.nan],['Jo','999999 66']]
df1 = pd.DataFrame(data,columns=['SecondName','Code2'])
What i need is to find where part of both codes are the same like 999999
or 0000
and output the SecondName
The expected output:
I have done my reserach and I found a way to locate a substring from a string but not from another substring like in my case.
Upvotes: 1
Views: 167
Reputation: 1669
You need to split the codes and concat all possible combinations of merged-results.
Here is the working code:
import pandas as pd
import numpy as np
data = [['Alex','11111111 20'],['Bob','2222222 0000'],['Clarke','33333 999999']]
df = pd.DataFrame(data,columns=['Name','Code'])
data = [['Reed','0000 88'],['Ros',np.nan],['Jo','999999 66']]
df1 = pd.DataFrame(data,columns=['SecondName','Code2'])
df[['c1', 'c2']] = df.Code.str.split(" ", expand=True)
df1[['c1', 'c2']] = df1.Code2.str.split(" ", expand=True)
rdf = pd.DataFrame()
for col1 in ['c1', 'c2']:
for col2 in ['c1', 'c2']:
rdf = pd.concat([rdf, df.merge(df1, left_on=[col1], right_on=[col2], how='inner')], axis=0)
rdf = df.merge(rdf[['Name', 'SecondName']], on='Name', how='outer')
print(rdf[['Name', 'SecondName']])
Output:
Name SecondName
0 Alex NaN
1 Bob Reed
2 Clarke Jo
Upvotes: 2