Ilja
Ilja

Reputation: 1053

Pandas - DF with lists - find all rows that match a string in any of the columns

I have the following dataframe:

ID     col1        col2     col3
0     ['a','b'] ['d','c'] ['e','d']
1     ['s','f'] ['f','a'] ['d','aaa']

Give an input string = 'a' I want to receive a dataframe like this:

ID     col1      col2     col3
0       1          0        0
1       0          1        0

I see how to do it with a for loop but that takes forever, and there must be a method I miss

Upvotes: 2

Views: 58

Answers (1)

jezrael
jezrael

Reputation: 862641

Processing lists in pandas is not vectorized supported, so performance is worse like scalars.

First idea is reshape lists columns to Series by DataFrame.stack, create scalars by Series.explode, so possible compare by a, test if match per first levels by Series.any, and last reshape back with convert boolean mask to integers:

df1 = df.set_index('ID').stack().explode().eq('a').any(level=[0,1]).unstack().astype(int)
print (df1)
    col1  col2  col3
ID                  
0      1     0     0
1      0     1     0

Or is possible use DataFrame.applymap for elementwise testing by lambda function with in:

df1 = df.set_index('ID').applymap(lambda x: 'a' in x).astype(int)

Or create for each lists column DataFrame, so possible test by a with DataFrame.any:

f = lambda x: pd.DataFrame(x.tolist(), index=x.index).eq('a').any(axis=1)
df1 = df.set_index('ID').apply(f).astype(int)

Upvotes: 3

Related Questions