Reputation: 174
I'd like to create a new column where if a substring is found in an existing column, it will return True and vice versa.
So in this example, I'd like to search for the substring "abc" in column a and create a Boolean column b whether column a contained the string or not.
a b
zabc True
wxyz False
abcy True
defg False
I've tried something like
df['b'] = df['a'].map(lambda x: True if 'abc' in x else False)
But this gave me an error saying "argument of type 'NoneType' is not iterable"
I also tried
df['b'] = False
df['b'][df['a'].str.contains('abc')] = True
But I got the error "cannot index with vector containing NA / NaN values"
Can someone explain the errors and what I can do about it. I have confirmed that ['a'] exists and contains values. But there are rows that contain None values.
Upvotes: 3
Views: 7499
Reputation: 1813
Not the best solution but you can check for null values with pd.isnull()
or convert null values to a string with str()
.
df = pd.DataFrame({'a':['zabc', None, 'abcy', 'defg']})
df['a'].map(lambda x: True if 'abc' in str(x) else False)
or
df['a'].map(lambda x: False if pd.isnull(x) or 'abc' not in x else True)
Reuslt:
0 True
1 False
2 True
3 False
Name: a, dtype: bool
Upvotes: 3
Reputation: 1159
Your first code is ok, here is the output on my sample.
s = pd.Series(['cat','hat','dog','fog','pet'])
d = pd.DataFrame(s, columns=['test'])
d['b'] = d['test'].map(lambda x: True if 'og' in x else False)
d
Upvotes: 1
Reputation: 2569
This how to do it.
df["b"] = df["a"].str.contains("abc")
Regarding your error.
It's seems that you have np.nan value in your column a, then the method str.contain will return np.nan for those value, as you try to index with an array containing np.nan value, pandas tell you that is not possible.
Upvotes: 6