Yash
Yash

Reputation: 357

How to filter a dataframe column having multiple values in Python

I have a data frame that sometimes has multiple values in cells like this:

df:
Fruits
apple, pineapple, mango
guava, blueberry, apple
custard-apple, cranberry
banana, kiwi, peach
apple

Now, I want to filter the data frame having an apple in the value. So my output should look like this:

Fruits
apple, pineapple, mango
guava, blueberry, apple
apple

I used the str.contains('apple') but this is not returning the ideal result.

Can anyone help me with how I can get this result?

Upvotes: 2

Views: 358

Answers (3)

Jason Baker
Jason Baker

Reputation: 3708

You can use .query with .contains

import pandas as pd


data = {
    "Fruits": ["apple, pineapple, mango", "guava, blueberry, apple", "custard-apple, cranberry",
               "banana, kiwi, peach", "apple"]
}

df = pd.DataFrame(data)
df = df.query("Fruits.str.contains('apple') & ~Fruits.str.contains('-apple')").reset_index(drop=True)
print(df)

                    Fruits
0  apple, pineapple, mango
1  guava, blueberry, apple
2                    apple

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You can split the data by ,, explode them, then compare with apple:

mask = df['Fruits'].str.split(', ').explode().eq('apple').groupby(level=0).any()
df[mask]

Output:

                    Fruits
0  apple, pineapple, mango
1  guava, blueberry, apple
4                    apple

Upvotes: 1

Vivek Menon M
Vivek Menon M

Reputation: 64

Here you go,

apple = df[df.values == "apple"] 
print("The df with apple:", apple)

enter image description here

enter image description here

Upvotes: -1

Related Questions