Reputation: 88
Problem description: i need to set a variable for each line, but only if it is within the range of a list in a second column in the same row.
Sample Dataframe:
df = pd.Dataframe({'col1': ['A', 'T' , 'P', 'Z'], 'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})
i need to recieve all rows where col1 is part of col2. expected result:
col1 col2
'A' 'A, B, C'
'P' 'G, H, I, P'
My approach which returns a TypeError about Series objects being mutable and can not be hashed:
df[df['col2'].str.match(df['col1'])]
As far as i understand i have to point out somehow that the compare should be done within one row. I know itterrows would be an solution but i would prefer something without looping.
Upvotes: 1
Views: 347
Reputation: 862511
Use list comprehension with test by in
with splitted values:
import pandas as pd
df = pd.DataFrame({'col1': ['A', 'T' , 'P', 'Z'],
'col2': ['A, B, C', 'D, E, F' , 'G, H, I, P', 'M, N, R, ZGTR']})
df = df[[b in a.split(', ') for a, b in df[['col2', 'col1']].values]]
print (df)
col1 col2
0 A A, B, C
2 P G, H, I, P
Upvotes: 2