Reputation: 51

Filtering a column in a data frame to get only column entries that contain a specific word

print(data['PROD_NAME'])

0           Natural Chip        Compny SeaSalt175g
1                         CCs Nacho Cheese    175g
2           Smiths Crinkle Cut  Chips Chicken 170g
3           Smiths Chip Thinly  S/Cream&Onion 175g
4         Kettle Tortilla ChpsHny&Jlpno Chili 150g
                            ...                   
264831     Kettle Sweet Chilli And Sour Cream 175g
264832               Tostitos Splash Of  Lime 175g
264833                    Doritos Mexicana    170g
264834     Doritos Corn Chip Mexican Jalapeno 150g
264835               Tostitos Splash Of  Lime 175g
Name: PROD_NAME, Length: 264836, dtype: object

I only want product names that have the word 'chip' in it somewhere.

new_data = pd.DataFrame(data['PROD_NAME'].str.contains("Chip"))

print(pd.DataFrame(new_data))


        PROD_NAME
0            True
1           False
2            True
3            True
4           False
...           ...
264831      False
264832      False
264833      False
264834       True
264835      False

[264836 rows x 1 columns]

My question is how do I remove the product_names that are False and instead of having True in the data frame above, get the product name which caused it to become True.

Btw, this is part of the Quantium data analytics virtual internship program.

Upvotes: 0

Answers (2)

oli5679

Reputation: 1749

Try using .loc with column names to select particular columns that meet the criteria you need. There is some documentation here, but the part before the comma is the boolean series you want to use as filter (in your case the str.contains('Chip') and after the comma are the column/columns you want to return (in your case 'PROD_NAME' but also works with another column/columns).

Example

import pandas as pd
example = {'PROD_NAME':['Chippy','ABC','A bag of Chips','MicroChip',"Product C"],'Weight':range(5)}

data = pd.DataFrame(example)

data.loc[data.PROD_NAME.str.contains('Chip'),'PROD_NAME']

#0            Chippy
#2    A bag of Chips
#3         MicroChip

Upvotes: 1

Mohamed Thasin ah

Reputation: 11192

you are almost there,

try this,

res = data[data['PROD_NAME'].str.contains("Chip")]

O/P:

                                 prod_name
0   Natural Chip        Compny SeaSalt175g
2   Smiths Crinkle Cut  Chips Chicken 170g
3   Smiths Chip Thinly  S/Cream&Onion 175g
8  Doritos Corn Chip Mexican Jalapeno 150g

Upvotes: 0

Filtering a column in a data frame to get only column entries that contain a specific word

Answers (2)

Related Questions