Reputation: 199
I'm trying to filter a dataframe to rows with column values that are in a list. However, the value in the column will not be an exact match to the list. Can I do some sort of wildcard or further contains?
For example, based on the list and dataframe below, I'd like to return or filter to just row 0.
list = ["123 ABC", "456 DEF", "789 GHI"]
id idlist
0 ABC
1 XYZ
Something like
df_2 = df[df["idlist"].isin(list)]
would not work because it would be searching for an exact match.
Upvotes: 1
Views: 324
Reputation: 31
You can use the operator library to check if the string is partly inlcuded:
import operator
operator.contains('ABC','123 ABC')
Upvotes: 0
Reputation: 402333
This is a more complicated string matching problem than usual, but you can use a list comprehension for performance:
lst = ["123 ABC", "456 DEF", "789 GHI"]
df['match'] = [any(x in l for l in lst) for x in df['idlist']]
df
id idlist match
0 0 ABC True
1 1 XYZ False
To simply filter, use
df[[any(x in l for l in lst) for x in df['idlist']]]
id idlist
0 0 ABC
List comprehensions are my to-go syntax for many string operations. I've written a detailed writeup about their advantages in For loops with pandas - When should I care?.
If you need to handle NaNs, use a function with try-catch handling.
def search(x, lst):
try:
return any(x in l for l in lst)
except TypeError:
return False
df[[search(x, lst) for x in df['idlist']]]
id idlist
0 0 ABC
Upvotes: 1