Reputation: 887
I would like to check whether several columns contain a string, and generate a Boolean column with the result. This is easy to do for a single column, but generates an Attribute Error (AttributeError: 'DataFrame' object has no attribute 'str'
) when this method is applied to multiple columns.
Example:
import pandas as pd
c1=[x+'x' for x in 'abcabc']
c2=['Y'+x+'m' for x in 'CABABC']
cols=['A','B']
df=pd.DataFrame(list(zip(c1,c2)),columns=cols)
df
Returns:
A B
0 ax YCm
1 bx YAm
2 cx YBm
3 ax YAm
4 bx YBm
5 cx YCm
The following code works when applied to a single column, but does not work when applied to several columns. I'd like something that fits in here and gives the desired result:
df['C']=df[cols].str.contains('c',case=False)
Thus the desired output is:
A B C
0 ax YCm True
1 bx YAm False
2 cx YBm True
3 ax YAm False
4 bx YBm False
5 cx YCm True
Edit: I updated my example to reflect the desire to actually search for whether the column "contains" a value, rather than "is equivalent to" that value.
Edit: in terms of timings, here's the benchmark I'd like to be able to match or beat, without creating the new columns (using a *1000
to the columns in my toy example):
newcols=['temp_'+x for x in cols]
for col in cols:
df['temp_'+col]=df[col].str.contains('c',case=False)
df['C']=df[newcols].any(axis=1)
df=df[['A','B','C']]
Upvotes: 2
Views: 5680
Reputation: 14949
An option via applymap
:
df['C'] = df.applymap(lambda x: 'c' in str(x).lower()).any(1)
Via stack/unstack
:
df['C'] = df.stack().str.contains('c', case=False).unstack().any(1)
df['C'] = df.stack().str.lower().str.contains('c').unstack().any(1)
OUTPUT:
A B C
0 ax YCm True
1 bx YAm False
2 cx YBm True
3 ax YAm False
4 bx YBm False
5 cx YCm True
Upvotes: 4
Reputation: 808
I would run an apply across the columns and take the any()
of those:
df['C']=df.apply(lambda y: y.str.contains('c',case=False),1).any(1)
Upvotes: 1