David Seroy
David Seroy

Reputation: 199

Filter Dataframe if column is in any part of list

I'm trying to filter a dataframe to rows with column values that are in a list. However, the value in the column will not be an exact match to the list. Can I do some sort of wildcard or further contains?

For example, based on the list and dataframe below, I'd like to return or filter to just row 0.

list = ["123 ABC", "456 DEF", "789 GHI"]

id    idlist
0      ABC    
1      XYZ

Something like

df_2 = df[df["idlist"].isin(list)]

would not work because it would be searching for an exact match.

Upvotes: 1

Views: 324

Answers (2)

MarioHo
MarioHo

Reputation: 31

You can use the operator library to check if the string is partly inlcuded:

import operator
operator.contains('ABC','123 ABC')

Upvotes: 0

cs95
cs95

Reputation: 402333

This is a more complicated string matching problem than usual, but you can use a list comprehension for performance:

lst = ["123 ABC", "456 DEF", "789 GHI"]
df['match'] = [any(x in l for l in lst) for x in df['idlist']]
df

   id idlist  match
0   0    ABC   True
1   1    XYZ  False

To simply filter, use

df[[any(x in l for l in lst) for x in df['idlist']]]

   id idlist
0   0    ABC

List comprehensions are my to-go syntax for many string operations. I've written a detailed writeup about their advantages in For loops with pandas - When should I care?.

If you need to handle NaNs, use a function with try-catch handling.

def search(x, lst):
    try:
        return any(x in l for l in lst)
    except TypeError:
        return False

df[[search(x, lst) for x in df['idlist']]]

   id idlist
0   0    ABC

Upvotes: 1

Related Questions