Fluxy
Fluxy

Reputation: 2978

How to filter pandas DataFrames that have columns with lists?

I have pandas DataFrame that contains lists in some columns:

col1    col2      col3
a       [1]       [a,b]
b       [1,2,3]   [a,c]
b       [1,2,3]   [b,c]
b       [1,3]     [b,c]

I want to get a subset of this DataFrame. In this subset the col2 should contain 1 and 2, and col3 should contain a.

In my example the answer should be:

col1    col2      col3
b       [1,2,3]   [a,c]

This is what I tried so far:

df[df["col2"].str.contains("1,2", na=False)]

How can I solve my task?

Upvotes: 2

Views: 49

Answers (3)

Aman Neo
Aman Neo

Reputation: 223

Inline approach
Suppose df is the original data frame

df[[True if 1 in i and 2 in i and "a" in j else False for i,j in zip(df.col2,df.col3) ]]

Can solve the problem

You can modify the logic as required

Upvotes: 0

Michael Szczesny
Michael Szczesny

Reputation: 5036

You can find the subset with set operations

df[df.apply(lambda x: not {1, 2, 'a'} - (set(x.col2) | set(x.col3)), axis=1)]

Out:

  col1       col2    col3
1    b  [1, 2, 3]  [a, c]

Upvotes: 1

Raghav Sharma
Raghav Sharma

Reputation: 195

I think you are looking for this

df['bool'] = df.apply(lambda x: True if 1 in x['col2'] and 2 in x['col2'] and 'a' in x['col3'] 
else False, axis=1)
df_subset = df.loc[df['bool'] == True, :]
del df_subset['bool']
print(df_subset)

First added a boolean column using apply whether the conditions match and then filtering based on that column

Upvotes: 1

Related Questions