Reputation: 1857
I have a DataFrame named df
, and I want to know the df
whether contains the element a
in each row.
import pandas as pd
import numpy as np
df=pd.DataFrame({'id':[1,2,3],'item1':['a','c','a'],
'item2':['b','d','e'],'item3':['c','e',np.nan]})
Input:
id item1 item2 item3
0 1 a b c
1 2 c d e
2 3 a e NaN
In the new column contain_a
, 1
represents the column item1
or item2
or item3
contains the element a
.
Expected:
id item1 item2 item3 contains_a
0 1 a b c 1
1 2 c d e 0
2 3 a e NaN 1
Upvotes: 0
Views: 147
Reputation: 109546
Check each column barring id
in column 1 (df.iloc[:, 1:]
) using the string accessor to see if it contains the letter a
, and then use any
along the rows (axis=1
). Convert the boolean result to an integer (1 or 0).
>>> df.assign(contains_a=df.iloc[:, 1:].apply(lambda s: s.str.contains('a')).any(axis=1).astype(int))
id item1 item2 item3 contains_a
0 1 a b c 1
1 2 c d e 0
2 3 a e NaN 1
Too make this more general for multiple targets:
targets = ['aa', 'a', 'b', 'c']
d = {'contains_{}'.format(target):
df.iloc[:, 1:].apply(lambda s: s.str.contains(target)).any(axis=1).astype(int)
for target in targets}
>>> df.assign(**d)
id item1 item2 item3 contains_a contains_aa contains_b contains_c
0 1 a b c 1 0 1 1
1 2 c d e 0 0 0 1
2 3 a e NaN 1 0 0 0
Upvotes: 1
Reputation: 76917
Use
In [578]: df['contains_a'] = df.filter(like='item').eq('a').any(1).astype(int)
In [579]: df
Out[579]:
id item1 item2 item3 contains_a
0 1 a b c 1
1 2 c d e 0
2 3 a e NaN 1
Upvotes: 2