himself
himself

Reputation: 88

compare two string columns in dataframe row-wise

Problem description: i need to set a variable for each line, but only if it is within the range of a list in a second column in the same row.

Sample Dataframe:

df = pd.Dataframe({'col1': ['A', 'T' , 'P', 'Z'], 'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})

i need to recieve all rows where col1 is part of col2. expected result:

col1    col2
'A'     'A, B, C'
'P'     'G, H, I, P'

My approach which returns a TypeError about Series objects being mutable and can not be hashed:

df[df['col2'].str.match(df['col1'])]

As far as i understand i have to point out somehow that the compare should be done within one row. I know itterrows would be an solution but i would prefer something without looping.

Upvotes: 1

Views: 347

Answers (1)

jezrael
jezrael

Reputation: 862511

Use list comprehension with test by in with splitted values:

import pandas as pd

df = pd.DataFrame({'col1': ['A', 'T' , 'P', 'Z'], 
                   'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})
df = df[[b in a.split(', ') for a, b in df[['col2', 'col1']].values]]
print (df)
  col1        col2
0    A     A, B, C
2    P  G, H, I, P

Upvotes: 2

Related Questions