Dan Lim
Dan Lim

Reputation: 31

Boolean mask for lists as entries in pandas dataframe

I have a pandas DataFrame that contains lists as entries

data = {'col1': [
['foo', 'bar', 'baz'], 
['cat', 'dog', 'horse'], 
[1, 2, 3]
]}

df = pd.DataFrame(data)

I then want to return rows using boolean mask IF 'foo' is in the list of any row (in this case, row 0). The following will return an empty DataFrame:

df[df['col1'] == 'foo']

The best way I can achieve the above is the following:

df[df['col1'].apply(lambda x: True if 'foo' in x else False)]

but I feel like there is a way to simplify this code. Any suggestions?

Upvotes: 3

Views: 1623

Answers (1)

thomas
thomas

Reputation: 449

As Henry already posted in the comments, you can shrink the code, if you use 'foo' in x inside lambda. To me, this looks pythonic enough.

The complete line would be

df[df["col1"].apply(lambda x: 'foo' in x)]

If you want to avoid the lambda expression you can use:

def inside(my_list, key): return key in my_list
out = df[df["col1"].apply(inside, key="foo")]

This uses a function defined in advance, which could be extended. This is not possible with the lambda expression.

Upvotes: 1

Related Questions