Reputation: 696
I have two pandas dataframes, and I need to filter df2 and get a new df with only the matched rows based on a substring match. I can do this with an exact match, but not a partial match. I have tried a few things and can't figure out what I'm missing.
df1 looks like:
LEVEL SO# LI# BOMITEM PRTNO
0 1 00000 000000 0000 123-456789-001
1 2 00000 000000 0000 123-456019-002
2 3 00000 000000 0000 123-456029-003
3 4 00000 000000 0000 123-456039-004
4 5 00000 000000 0000 123-456049-005
df2 looks like:
name test_table_id pk version Key_PN
0 Somename 48 48 1.0
1 Somename 48 49 1.1 123-456789-
2 Somename 48 201 1.2 000-000000-000
3 Somename 48 202 1.3 123-984560-000
4 Antoher name 193 194 1.0 123-984560-321
I am looking to get a dataframe that looks like this:
name test_table_id pk version Key_PN
0 Somename 48 48 1.1 123-456789-
This is the code I used that worked only with an exact match:
df 2= df2[pd.Series(list(df2['Key_PN']), index=df2.index).isin(list(df1['PRTNO']))]
Upvotes: 1
Views: 28
Reputation: 26676
Use regex to remove the last 2 digits in df.PRTNO and then check similarity using .isin()
df2[df2.Key_PN.isin(df['PRTNO'].str.replace("\d+$",""))]
name test_table_id pk version Key_PN
1 Somename 48 49 1.1 123-456789-
Upvotes: 1