Sid
Sid

Reputation: 23

Check every row of a df column for values in a list

The column I'm interested in the dataframe looks like

names=['nonsoluable water', 'water percentage 98% grade', 'special chemical with grade chlorine', 'name with value']

There are other columns too. Those are just numbers/identifiers.

I need every row of the column to be checked if it has any value from the list:

check_for_these = waters, grades, %, chemical

If the column has any of those values from the list, I want it to flag the row in a new column.

I've tried this:

df['names'].apply(lambda x: any([k in x for k in check_for_these]))

but it raises errors/gives wrong output.

And the isin function also raises errors:

df['match'] = df["names"].isin(check_for_these)
print(df)

I want the output to be like the image given below

Upvotes: 0

Views: 2199

Answers (2)

Alonso G.
Alonso G.

Reputation: 11

First you need to be sure that all your data in names column is in correct format so you need to transform all letters in lower case, you can make it like this

df['name'] = df['name'].apply(lambda str : str.lower())

Now, you can use the operation 'in' for strings in python, this operation return true if a string is substring of another string; this operation is case sensitive that's why we care about lower case. This works like this

'hello' in 'all my friends, hello' ----> True
'$' in 'bitcoin is cheap $.$' ---------> True
'Hello' in 'hello world' --------------> False

so, in your case, you have the list

check_for_these = ['waters', 'grades', '%', 'chemical']

then you can make your 'true', 'false' valued pandas series with

df['names'].apply(lambda str: any([(reqWord in str) for reqWord in check_for_these]))

now you just have to do the map {True : 1, False : 0} and make the flag column, so the solution is

df['flag'] = df['names'].apply(lambda str: any([(reqWord in str) for reqWord in check_for_these])).map({True : 1, False : 0})

Note: You can omit the fist step by doing the next

df['flag'] = df['names'].apply(lambda str: any([(reqWord in str.lower()) for reqWord in check_for_these])).map({True : 1, False : 0})

Upvotes: 1

user17242583
user17242583

Reputation:

Try this:

df['flag'] = df['names'].str.contains('|'.join(check_for_these), regex=True, case=False).astype(int)

Upvotes: 2

Related Questions