Matthias Gallagher
Matthias Gallagher

Reputation: 471

Pandas: Select rows that contain any substring from a list

I would like to select those rows in a column that contains any of the substrings in a list. This is what I have for now.

product = ['LID', 'TABLEWARE', 'CUP', 'COVER', 'CONTAINER', 'PACKAGING']

df_plastic_prod = df_plastic[df_plastic['Goods Shipped'].str.contains(product)]

df_plastic_prod.info()

Sample df_plastic

Name          Product
David        PLASTIC BOTTLE
Meghan       PLASTIC COVER
Melanie      PLASTIC CUP 
Aaron        PLASTIC BOWL
Venus        PLASTIC KNIFE
Abigail      PLASTIC CONTAINER
Sophia       PLASTIC LID

Desired df_plastic_prod

Name          Product
Meghan       PLASTIC COVER
Melanie      PLASTIC CUP 
Abigail      PLASTIC CONTAINER
Sophia       PLASTIC LID

Thanks in advance! I appreciate any assistance on this!

Upvotes: 6

Views: 4893

Answers (2)

jezrael
jezrael

Reputation: 862661

For match values by subtrings join all values of list by | for regex or - so get values LID or TABLEWARE ...:

Solution working well also with 2 or more words in list.

pat = '|'.join(r"\b{}\b".format(x) for x in product)
df_plastic_prod = df_plastic[df_plastic['Product'].str.contains(pat)]
print (df_plastic_prod)
      Name            Product
1   Meghan      PLASTIC COVER
2  Melanie        PLASTIC CUP
5  Abigail  PLASTIC CONTAINER
6   Sophia        PLASTIC LID

Upvotes: 6

s3dev
s3dev

Reputation: 9701

One solution is using regex to parse the 'Product' column, and test if any of the extracted values are in the product list, then filter the original DataFrame on the results.

In this case, a very simple regex pattern is used ((\w+)$) which matches a single word at the end of a line.

Sample code:

df.iloc[df['Product'].str.extract('(\w+)$').isin(product).to_numpy(), :]

Output:

      Name            Product
1   Meghan      PLASTIC COVER
2  Melanie        PLASTIC CUP
5  Abigail  PLASTIC CONTAINER
6   Sophia        PLASTIC LID

Setup:

product = ['LID', 'TABLEWARE', 'CUP', 
           'COVER', 'CONTAINER', 'PACKAGING']

data = {'Name': ['David', 'Meghan', 'Melanie', 
                 'Aaron', 'Venus', 'Abigail', 'Sophia'],
        'Product': ['PLASTIC BOTTLE', 'PLASTIC COVER', 'PLASTIC CUP', 
                    'PLASTIC BOWL', 'PLASTIC KNIFE', 'PLASTIC CONTAINER',
                    'PLASTIC LID']}
    
df = pd.DataFrame(data)

Upvotes: 0

Related Questions