Reputation: 471
I would like to select those rows in a column that contains any of the substrings in a list. This is what I have for now.
product = ['LID', 'TABLEWARE', 'CUP', 'COVER', 'CONTAINER', 'PACKAGING']
df_plastic_prod = df_plastic[df_plastic['Goods Shipped'].str.contains(product)]
df_plastic_prod.info()
Sample df_plastic
Name Product
David PLASTIC BOTTLE
Meghan PLASTIC COVER
Melanie PLASTIC CUP
Aaron PLASTIC BOWL
Venus PLASTIC KNIFE
Abigail PLASTIC CONTAINER
Sophia PLASTIC LID
Desired df_plastic_prod
Name Product
Meghan PLASTIC COVER
Melanie PLASTIC CUP
Abigail PLASTIC CONTAINER
Sophia PLASTIC LID
Thanks in advance! I appreciate any assistance on this!
Upvotes: 6
Views: 4893
Reputation: 862661
For match values by subtrings join all values of list by |
for regex or
- so get values LID
or TABLEWARE
...:
Solution working well also with 2 or more words in list
.
pat = '|'.join(r"\b{}\b".format(x) for x in product)
df_plastic_prod = df_plastic[df_plastic['Product'].str.contains(pat)]
print (df_plastic_prod)
Name Product
1 Meghan PLASTIC COVER
2 Melanie PLASTIC CUP
5 Abigail PLASTIC CONTAINER
6 Sophia PLASTIC LID
Upvotes: 6
Reputation: 9701
One solution is using regex to parse the 'Product'
column, and test if any of the extracted values are in the product
list, then filter the original DataFrame on the results.
In this case, a very simple regex pattern is used ((\w+)$
) which matches a single word at the end of a line.
Sample code:
df.iloc[df['Product'].str.extract('(\w+)$').isin(product).to_numpy(), :]
Output:
Name Product
1 Meghan PLASTIC COVER
2 Melanie PLASTIC CUP
5 Abigail PLASTIC CONTAINER
6 Sophia PLASTIC LID
Setup:
product = ['LID', 'TABLEWARE', 'CUP',
'COVER', 'CONTAINER', 'PACKAGING']
data = {'Name': ['David', 'Meghan', 'Melanie',
'Aaron', 'Venus', 'Abigail', 'Sophia'],
'Product': ['PLASTIC BOTTLE', 'PLASTIC COVER', 'PLASTIC CUP',
'PLASTIC BOWL', 'PLASTIC KNIFE', 'PLASTIC CONTAINER',
'PLASTIC LID']}
df = pd.DataFrame(data)
Upvotes: 0