amquack
amquack

Reputation: 887

Check if at least one column contains a string in pandas

I would like to check whether several columns contain a string, and generate a Boolean column with the result. This is easy to do for a single column, but generates an Attribute Error (AttributeError: 'DataFrame' object has no attribute 'str') when this method is applied to multiple columns.

Example:

import pandas as pd

c1=[x+'x' for x in 'abcabc']
c2=['Y'+x+'m' for x in 'CABABC']
cols=['A','B']

df=pd.DataFrame(list(zip(c1,c2)),columns=cols)
df

Returns:

    A   B
0   ax  YCm
1   bx  YAm
2   cx  YBm
3   ax  YAm
4   bx  YBm
5   cx  YCm

The following code works when applied to a single column, but does not work when applied to several columns. I'd like something that fits in here and gives the desired result:

df['C']=df[cols].str.contains('c',case=False)

Thus the desired output is:

    A   B   C
0   ax  YCm True
1   bx  YAm False
2   cx  YBm True
3   ax  YAm False
4   bx  YBm False
5   cx  YCm True

Edit: I updated my example to reflect the desire to actually search for whether the column "contains" a value, rather than "is equivalent to" that value.

Edit: in terms of timings, here's the benchmark I'd like to be able to match or beat, without creating the new columns (using a *1000 to the columns in my toy example):

newcols=['temp_'+x for x in cols]

for col in cols:
    df['temp_'+col]=df[col].str.contains('c',case=False)

df['C']=df[newcols].any(axis=1)
df=df[['A','B','C']]

Upvotes: 2

Views: 5680

Answers (2)

Nk03
Nk03

Reputation: 14949

An option via applymap :

df['C'] = df.applymap(lambda x: 'c' in str(x).lower()).any(1)

Via stack/unstack:

df['C'] = df.stack().str.contains('c', case=False).unstack().any(1)
df['C'] = df.stack().str.lower().str.contains('c').unstack().any(1)

OUTPUT:

    A    B      C
0  ax  YCm   True
1  bx  YAm  False
2  cx  YBm   True
3  ax  YAm  False
4  bx  YBm  False
5  cx  YCm   True

Upvotes: 4

distracted-biologist
distracted-biologist

Reputation: 808

I would run an apply across the columns and take the any() of those:

df['C']=df.apply(lambda y: y.str.contains('c',case=False),1).any(1)

Upvotes: 1

Related Questions