Reputation: 2978
I have pandas DataFrame that contains lists in some columns:
col1 col2 col3
a [1] [a,b]
b [1,2,3] [a,c]
b [1,2,3] [b,c]
b [1,3] [b,c]
I want to get a subset of this DataFrame. In this subset the col2
should contain 1
and 2
, and col3
should contain a
.
In my example the answer should be:
col1 col2 col3
b [1,2,3] [a,c]
This is what I tried so far:
df[df["col2"].str.contains("1,2", na=False)]
How can I solve my task?
Upvotes: 2
Views: 49
Reputation: 223
Inline approach
Suppose df
is the original data frame
df[[True if 1 in i and 2 in i and "a" in j else False for i,j in zip(df.col2,df.col3) ]]
Can solve the problem
You can modify the logic as required
Upvotes: 0
Reputation: 5036
You can find the subset with set operations
df[df.apply(lambda x: not {1, 2, 'a'} - (set(x.col2) | set(x.col3)), axis=1)]
Out:
col1 col2 col3
1 b [1, 2, 3] [a, c]
Upvotes: 1
Reputation: 195
I think you are looking for this
df['bool'] = df.apply(lambda x: True if 1 in x['col2'] and 2 in x['col2'] and 'a' in x['col3']
else False, axis=1)
df_subset = df.loc[df['bool'] == True, :]
del df_subset['bool']
print(df_subset)
First added a boolean column using apply
whether the conditions match and then filtering based on that column
Upvotes: 1