Reputation: 51
print(data['PROD_NAME'])
0 Natural Chip Compny SeaSalt175g
1 CCs Nacho Cheese 175g
2 Smiths Crinkle Cut Chips Chicken 170g
3 Smiths Chip Thinly S/Cream&Onion 175g
4 Kettle Tortilla ChpsHny&Jlpno Chili 150g
...
264831 Kettle Sweet Chilli And Sour Cream 175g
264832 Tostitos Splash Of Lime 175g
264833 Doritos Mexicana 170g
264834 Doritos Corn Chip Mexican Jalapeno 150g
264835 Tostitos Splash Of Lime 175g
Name: PROD_NAME, Length: 264836, dtype: object
I only want product names that have the word 'chip' in it somewhere.
new_data = pd.DataFrame(data['PROD_NAME'].str.contains("Chip"))
print(pd.DataFrame(new_data))
PROD_NAME
0 True
1 False
2 True
3 True
4 False
... ...
264831 False
264832 False
264833 False
264834 True
264835 False
[264836 rows x 1 columns]
My question is how do I remove the product_names that are False and instead of having True in the data frame above, get the product name which caused it to become True.
Btw, this is part of the Quantium data analytics virtual internship program.
Upvotes: 0
Views: 35
Reputation: 1749
Try using .loc with column names to select particular columns that meet the criteria you need. There is some documentation here, but the part before the comma is the boolean series you want to use as filter (in your case the str.contains('Chip') and after the comma are the column/columns you want to return (in your case 'PROD_NAME' but also works with another column/columns).
Example
import pandas as pd
example = {'PROD_NAME':['Chippy','ABC','A bag of Chips','MicroChip',"Product C"],'Weight':range(5)}
data = pd.DataFrame(example)
data.loc[data.PROD_NAME.str.contains('Chip'),'PROD_NAME']
#0 Chippy
#2 A bag of Chips
#3 MicroChip
Upvotes: 1
Reputation: 11192
you are almost there,
try this,
res = data[data['PROD_NAME'].str.contains("Chip")]
O/P:
prod_name
0 Natural Chip Compny SeaSalt175g
2 Smiths Crinkle Cut Chips Chicken 170g
3 Smiths Chip Thinly S/Cream&Onion 175g
8 Doritos Corn Chip Mexican Jalapeno 150g
Upvotes: 0