Reputation: 1053
I have the following dataframe:
ID col1 col2 col3
0 ['a','b'] ['d','c'] ['e','d']
1 ['s','f'] ['f','a'] ['d','aaa']
Give an input string = 'a'
I want to receive a dataframe like this:
ID col1 col2 col3
0 1 0 0
1 0 1 0
I see how to do it with a for
loop but that takes forever, and there must be a method I miss
Upvotes: 2
Views: 58
Reputation: 862641
Processing lists in pandas is not vectorized supported, so performance is worse like scalars.
First idea is reshape lists columns to Series
by DataFrame.stack
, create scalars by Series.explode
, so possible compare by a
, test if match per first levels by Series.any
, and last reshape back with convert boolean mask to integers:
df1 = df.set_index('ID').stack().explode().eq('a').any(level=[0,1]).unstack().astype(int)
print (df1)
col1 col2 col3
ID
0 1 0 0
1 0 1 0
Or is possible use DataFrame.applymap
for elementwise testing by lambda function with in
:
df1 = df.set_index('ID').applymap(lambda x: 'a' in x).astype(int)
Or create for each lists column DataFrame
, so possible test by a
with DataFrame.any
:
f = lambda x: pd.DataFrame(x.tolist(), index=x.index).eq('a').any(axis=1)
df1 = df.set_index('ID').apply(f).astype(int)
Upvotes: 3