Reputation: 1352
I would like to check if items in a List are in a Column from my DF.
The basics where straightforward:
fruit = ['apple','banana'] # This items should be in the column
fruit = ', '.join(fruit) # Think this is the point where it goes wrong...
fruit_resulst = df['all_fruit'].str.contains(fruit) # Check if column contains fruit
df_new = df[fruit_resulst] # Filter so that we only keep the TRUEs
This works, but not completely. It only works in this specific order, but I would like to have it working in all orders (e.g., if a column row contains ALL items from the list, then I would like to keep them. Else, remove.
df['all_fruit']
Apple, Banana #Return! Because it contains apple and banana
Banana # Do not return
Banana, Apple #Return! Because it contains apple and banana
Apple # Do not return
Apple, Banana, Peer #Return! Because it contains apple and banana
Thanks a lot in advance!
Upvotes: 2
Views: 975
Reputation: 862511
Convert values to lowercase, then split to lists and test issubset
by convert fruit
to set
:
df1 = df[df.all_fruit.str.lower().str.split(', ').map(set(fruit).issubset)]
print (df1)
all_fruit
0 Apple, Banana
2 Banana, Apple
4 Apple, Banana, Peer
Your solution with list of boolean masks passed to np.logical_and.reduce:
df1 = df[np.logical_and.reduce([df.all_fruit.str.contains(f, case=False) for f in fruit])]
print (df1)
all_fruit
0 Apple, Banana
2 Banana, Apple
4 Apple, Banana, Peer
Upvotes: 2
Reputation: 71570
Try this code:
x = df['all_fruit'].str.split(',', expand=True)
print(df[x.replace('Apple', '').ne(x).any(1) & x.replace(' Banana', '').ne(x).any(1)])
Output:
all_fruit
0 Apple, Banana
2 Banana, Apple
4 Apple, Banana, Peer
Upvotes: 1
Reputation: 333
df = pd.DataFrame({'all_fruit': [
'Apple, Banana',
'Banana',
'Banana, Apple',
'Apple',
'Apple, Banana, Peer',
]})
fruit = ['apple','banana']
have_fruits = [df.all_fruit.str.contains(f, case=False) for f in fruit]
indexes = True
for f in have_fruits:
indexes = indexes * f
df[indexes]
Upvotes: 1