Reputation: 21
Sorry for a somewhat basic question, pretty new to python / pandas.
I'm trying to create a column from my database that returns True or False as to whether another column contains any (not all) string from a list of strings. Currently my code looks like this:
keywords_list = ["foo, bar, ..etc]
df['relevant'] = df['Description'].isin(keywords_list)
I know that my 'Description' column contains some of the values in the list, but it is returning all as false. I've looked at similar stackoverflow questions (see below), and they all say to do what I am doing. But the pandas documentation (also below) says that isin only works if it contains all the values in the list. Is there a function I can use that will return if the column includes any of the values in the list? Please help!
Filter out rows based on list of strings in Pandas https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html
Upvotes: 2
Views: 4551
Reputation: 38415
You may have to separate the words using split and then use isin
df = pd.DataFrame({'Description': ['foo bar blah', 'new foo', 'newfoo', 'bar']})
keywords_list = ["foo", "bar"]
df['Description'].str.split(expand = True).isin(keywords_list).any(1)
0 True
1 True
2 False
3 True
Upvotes: 3
Reputation: 294278
Use pandas.Series.str.contains
df['Description'].str.contains('|'.join(keywords_list))
Upvotes: 4