ewong18
ewong18

Reputation: 174

Create new pandas dataframe column containing boolean output from searching for substrings

I'd like to create a new column where if a substring is found in an existing column, it will return True and vice versa.

So in this example, I'd like to search for the substring "abc" in column a and create a Boolean column b whether column a contained the string or not.

a      b
zabc   True
wxyz   False
abcy   True
defg   False

I've tried something like

df['b'] = df['a'].map(lambda x: True if 'abc' in x else False)

But this gave me an error saying "argument of type 'NoneType' is not iterable"

I also tried

df['b'] = False
df['b'][df['a'].str.contains('abc')] = True

But I got the error "cannot index with vector containing NA / NaN values"

Can someone explain the errors and what I can do about it. I have confirmed that ['a'] exists and contains values. But there are rows that contain None values.

Upvotes: 3

Views: 7499

Answers (3)

Michael Gardner
Michael Gardner

Reputation: 1813

Not the best solution but you can check for null values with pd.isnull() or convert null values to a string with str().

df = pd.DataFrame({'a':['zabc', None, 'abcy', 'defg']})


df['a'].map(lambda x: True if 'abc' in str(x) else False)

or

df['a'].map(lambda x: False if pd.isnull(x) or 'abc' not in x else True)

Reuslt:

    0     True
    1    False
    2     True
    3    False
    Name: a, dtype: bool

Upvotes: 3

abdoulsn
abdoulsn

Reputation: 1159

Your first code is ok, here is the output on my sample.

s = pd.Series(['cat','hat','dog','fog','pet'])
d = pd.DataFrame(s, columns=['test'])
d['b'] = d['test'].map(lambda x: True if 'og' in x else False)
d

enter image description here

Upvotes: 1

Florian Bernard
Florian Bernard

Reputation: 2569

This how to do it.

df["b"] = df["a"].str.contains("abc")

Regarding your error.

It's seems that you have np.nan value in your column a, then the method str.contain will return np.nan for those value, as you try to index with an array containing np.nan value, pandas tell you that is not possible.

Upvotes: 6

Related Questions