Reputation: 69
Please help me complete this piece of code. Let me know of any other detail is required.
Thanks in advance!
Given: a column 'PROD_NAME' from pandas dataframe of string type (e.g. Smiths Crinkle Cut Chips Chicken g), a list of certain words (['Chip', 'Chips' etc])
To do: if none of the words from the list is contained in the strings of the dataframe objects, we drop the whole row. Basically we're removing unnecessary products from a dataframe.
This is what data looks like:
Here's my code:
# create a function to Keep only those products which have
# chip, chips, doritos, dorito, pringle, Pringles, Chps, chp, in their name
def onlyChips(df, *cols):
temp = []
chips = ['Chip', 'Chips', 'Doritos', 'Dorito', 'Pringle', 'Pringles', 'Chps', 'Chp']
copy = cp.deepcopy(df)
for col in [*cols]:
for i in range(len(copy[col])):
for item in chips:
if item not in copy[col][i]:
flag = False
else:
flag = True
break;
# drop only those string which doesn't have any match from chips list, if flag never became True
if not flag:
# drop the whole row
return <new created dataframe>
new = onlyChips(df_txn, 'PROD_NAME')
Upvotes: 1
Views: 256
Reputation: 10960
Filter the rows instead of deleting them. Create a boolean mask for each row. Use str.contains
on each column you need to search and see if any of the columns match the given criteria row-wise. Filter the rows if not.
search_cols = ['PROD_NAME']
mask = df[search_cols].apply(lambda x: x.str.contains('|'.join(chips))).any(axis=1)
df = df[mask]
Upvotes: 2