Reputation: 23
The column I'm interested in the dataframe looks like
names=['nonsoluable water', 'water percentage 98% grade', 'special chemical with grade chlorine', 'name with value']
There are other columns too. Those are just numbers/identifiers.
I need every row of the column to be checked if it has any value from the list:
check_for_these = waters, grades, %, chemical
If the column has any of those values from the list, I want it to flag the row in a new column.
I've tried this:
df['names'].apply(lambda x: any([k in x for k in check_for_these]))
but it raises errors/gives wrong output.
And the isin
function also raises errors:
df['match'] = df["names"].isin(check_for_these)
print(df)
I want the output to be like the image given below
Upvotes: 0
Views: 2199
Reputation: 11
First you need to be sure that all your data in names column is in correct format so you need to transform all letters in lower case, you can make it like this
df['name'] = df['name'].apply(lambda str : str.lower())
Now, you can use the operation 'in' for strings in python, this operation return true if a string is substring of another string; this operation is case sensitive that's why we care about lower case. This works like this
'hello' in 'all my friends, hello' ----> True
'$' in 'bitcoin is cheap $.$' ---------> True
'Hello' in 'hello world' --------------> False
so, in your case, you have the list
check_for_these = ['waters', 'grades', '%', 'chemical']
then you can make your 'true', 'false' valued pandas series with
df['names'].apply(lambda str: any([(reqWord in str) for reqWord in check_for_these]))
now you just have to do the map {True : 1, False : 0} and make the flag column, so the solution is
df['flag'] = df['names'].apply(lambda str: any([(reqWord in str) for reqWord in check_for_these])).map({True : 1, False : 0})
Note: You can omit the fist step by doing the next
df['flag'] = df['names'].apply(lambda str: any([(reqWord in str.lower()) for reqWord in check_for_these])).map({True : 1, False : 0})
Upvotes: 1
Reputation:
Try this:
df['flag'] = df['names'].str.contains('|'.join(check_for_these), regex=True, case=False).astype(int)
Upvotes: 2