Chris Jones
Chris Jones

Reputation: 696

How can I filter a pandas dataframe of substrings based on another dataframe's column of full strings?

I have two pandas dataframes, and I need to filter df2 and get a new df with only the matched rows based on a substring match. I can do this with an exact match, but not a partial match. I have tried a few things and can't figure out what I'm missing.

df1 looks like:

      LEVEL    SO#     LI# BOMITEM              PRTNO  
0         1  00000  000000    0000     123-456789-001  
1         2  00000  000000    0000     123-456019-002 
2         3  00000  000000    0000     123-456029-003  
3         4  00000  000000    0000     123-456039-004  
4         5  00000  000000    0000     123-456049-005  

df2 looks like:

                          name  test_table_id   pk version       Key_PN
0                Somename                  48   48     1.0
1                Somename                  48   49     1.1  123-456789-
2                Somename                  48  201     1.2  000-000000-000
3                Somename                  48  202     1.3  123-984560-000
4                Antoher name              193 194     1.0  123-984560-321

I am looking to get a dataframe that looks like this:

                          name  test_table_id   pk version       Key_PN
0                Somename                  48   48     1.1  123-456789-

This is the code I used that worked only with an exact match:

df 2= df2[pd.Series(list(df2['Key_PN']), index=df2.index).isin(list(df1['PRTNO']))]

Upvotes: 1

Views: 28

Answers (1)

wwnde
wwnde

Reputation: 26676

Use regex to remove the last 2 digits in df.PRTNO and then check similarity using .isin()

df2[df2.Key_PN.isin(df['PRTNO'].str.replace("\d+$",""))]



    name  test_table_id  pk  version       Key_PN
1  Somename             48  49      1.1  123-456789-

Upvotes: 1

Related Questions