How do I join two dataframes based on a dataframe's row value being a substring/contained in another dataframe's row value?

Question

I have the following two dataframes -

DataFrame - A

DataFrame - B

Now, I want to perform a join on both DataFrames on the columns 'Code - A' and 'Code - B' such that the join is performed if the value in 'Code - A' is a substring of or contains the value in 'Code - B'. A sample output for this example would be as follows -

The values in 'Code - A' and 'Code - B' are of type int, which I am converting to a string for easier manipulation. And, so far, I have implemented the following code:

data_A = [['A', 100010], ['B', 200010]] 
A = pd.DataFrame(data_A, columns = ['Type - A', 'Code - A'])
data_B = [['AA', 1000], ['BB', 10001020],['CC',10001030],['DD',10001040],['EE',20001020],['FF',20001030],['GG',200010]] 
B = pd.DataFrame(data_B, columns = ['Type - B', 'Code - B'])
A['Code - A'] = A['Code - A'].astype(str)
B['Code - B'] = B['Code - B'].astype(str)  
A['join'] = 1 #creating a dummy column in Dataframe A called join
B['join'] = 1 #creating a dummy column in Dataframe B called join
cartesian = A.merge(B, on='join').drop('join', axis=1) #getting the cartesian product
cartesian['Match'] = cartesian.apply(lambda x: x['Code - A'].find(x['Code - B']), axis=1).ge(0) #checking for columns where 'Code - A' has a string which is either a substring of 'Code - B' or one where 'Code - B' is a substring of 'Code - A'
mask = cartesian['Match'] == True #filtering to see the matched items
full = cartesian.loc[mask]

However, this only returns the following Dataframe.

How should I modify my code to get the aforementioned output?

Thanks!

jschnurr · Accepted Answer

After you create the cartesian dataframe, you can query it like so:

cartesian[cartesian.apply(lambda row: row['Code - B'] in row['Code - A'] or row['Code - A'] \
  in row['Code - B'], axis = 1)]

Output:

   Type - A Code - A Type - B  Code - B
0         A   100010       AA      1000
1         A   100010       BB  10001020
2         A   100010       CC  10001030
3         A   100010       DD  10001040
11        B   200010       EE  20001020
12        B   200010       FF  20001030
13        B   200010       GG    200010

How do I join two dataframes based on a dataframe's row value being a substring/contained in another dataframe's row value?

Answers (2)

Related Questions

How do I join two dataframes based on a dataframe&#39;s row value being a substring/contained in another dataframe&#39;s row value?

Answers (2)

Related Questions

How do I join two dataframes based on a dataframe's row value being a substring/contained in another dataframe's row value?