fawkemvegas
fawkemvegas

Reputation: 13

How to search for a word in a column with Pandas

I have a pandas dataframe that has reviews in it an I want to search for a specific word in all of the columns.

df["Summary"].str.lower().str.contains("great", na=False)

This gives the outcome as true or false, but I want to create a new column with 1 or 0 written in the corresponding rows.

For example if the review has 'great' in it it should give as 1, not 2. I tried this:

if df["Summary"].str.lower().str.contains("great", na=False) == True:
    df["Great"] = '1'
else:
    df["Great"] = '0'

It gives this error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). How can I solve this?

Upvotes: 1

Views: 12655

Answers (3)

NBWL
NBWL

Reputation: 141

Your condition df["Summary"].str.lower().str.contains("great", na=False)

Will return a series of True or False values. It won't be equal to "True" because a series is not a python boolean. Instead you can do this to achieve what you want

df['Great'] = df['Summary'].apply(lambda x: 'great' in x.lower())

Upvotes: 2

cs95
cs95

Reputation: 403208

Since True/False corresponds to 1/0, all you need is an astype conversion from bool to int:

df['Great'] = df["Summary"].str.contains("great", case=False, na=False).astype(int)

Also note I've removed the str.lower call and added case=False as an argument to str.contains for a case insensitive comparison.


Another solution would be to lowercase and then disable the regex matching for better performance.

df['Great'] = (df["Summary"].str.lower()
                            .str.contains("great", regex=False, na=False)
                            .astype(int))

Finally, you can also use a list comprehension:

df['Great'] = [1 if 'great' in s.lower() else 0 for s in df['Summary']]

If you need to handle numeric data as well, use

df['Great'] = [
    1 if isinstance(s, str) and 'great' in s.lower() else 0 
    for s in df['Summary']
]

I've detailed the advantages of list comprehensions for object data ad nauseam in this post of mine: For loops with pandas - When should I care?

Upvotes: 2

David Sidarous
David Sidarous

Reputation: 1272

A possible solution using numpy

import numpy as np
df["Great"] = np.where(df["Summary"].str.lower().contains("great", na=False), '1', '0')

Check the documentation here.

Upvotes: 0

Related Questions