Reputation: 13
I have a pandas dataframe that has reviews in it an I want to search for a specific word in all of the columns.
df["Summary"].str.lower().str.contains("great", na=False)
This gives the outcome as true or false, but I want to create a new column with 1 or 0 written in the corresponding rows.
For example if the review has 'great' in it it should give as 1, not 2. I tried this:
if df["Summary"].str.lower().str.contains("great", na=False) == True:
df["Great"] = '1'
else:
df["Great"] = '0'
It gives this error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). How can I solve this?
Upvotes: 1
Views: 12655
Reputation: 141
Your condition df["Summary"].str.lower().str.contains("great", na=False)
Will return a series of True or False values. It won't be equal to "True" because a series is not a python boolean. Instead you can do this to achieve what you want
df['Great'] = df['Summary'].apply(lambda x: 'great' in x.lower())
Upvotes: 2
Reputation: 403208
Since True/False corresponds to 1/0, all you need is an astype
conversion from bool
to int
:
df['Great'] = df["Summary"].str.contains("great", case=False, na=False).astype(int)
Also note I've removed the str.lower
call and added case=False
as an argument to str.contains
for a case insensitive comparison.
Another solution would be to lowercase and then disable the regex matching for better performance.
df['Great'] = (df["Summary"].str.lower()
.str.contains("great", regex=False, na=False)
.astype(int))
Finally, you can also use a list comprehension:
df['Great'] = [1 if 'great' in s.lower() else 0 for s in df['Summary']]
If you need to handle numeric data as well, use
df['Great'] = [
1 if isinstance(s, str) and 'great' in s.lower() else 0
for s in df['Summary']
]
I've detailed the advantages of list comprehensions for object data ad nauseam in this post of mine: For loops with pandas - When should I care?
Upvotes: 2
Reputation: 1272
A possible solution using numpy
import numpy as np
df["Great"] = np.where(df["Summary"].str.lower().contains("great", na=False), '1', '0')
Check the documentation here.
Upvotes: 0