Reputation: 4439
I have a bunch of dataframes, and I want to find the dataframes that contains both the words i specify. For example, I want to find all dataframes that contain the words hello
and world
. A & B would qualify, C would not.
I've tried:
df[(df[column].str.contains('hello')) & (df[column].str.contains('world'))]
which only picks up B and df[(df[column].str.contains('hello')) | (df[column].str.contains('world'))]
which picks up all three.
I need something that picks only A & B
A=
Name Data
0 Mike hello
1 Mike world
2 Mike hello
3 Fred world
4 Fred hello
5 Ted world
B =
Name Data
0 Mike helloworld
1 Mike world
2 Mike hello
3 Fred world
4 Fred hello
5 Ted world
C=
Name Data
0 Mike hello
1 Mike hello
2 Mike hello
3 Fred hello
4 Fred hello
5 Ted hello
Upvotes: 4
Views: 123
Reputation: 59549
You want a single bool value for if 'hello'
is found anywhere and 'world'
is found anywhere in one column:
df.Data.str.contains('hello').any() & df.Data.str.contains('world').any()
If you have a list of words and need to check over the entire DataFrame
try:
import numpy as np
lst = ['hello', 'world']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])
print(df)
Name Data Data2
0 Mike hello orange
1 Mike world banana
2 Mike hello banana
3 Fred world apples
4 Fred hello mango
5 Ted world pear
lst = ['apple', 'hello', 'world']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])
#True
lst = ['apple', 'hello', 'world', 'bear']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])
# False
Upvotes: 5
Reputation: 38415
If hello and world are standalone strings in your data, df.eq() should do the job and you don't need str.contains. Its not a string method and works on entire dataframe.
(((df == 'hello').any()) & ((df == 'world').any())).any()
True
Upvotes: 1
Reputation: 323306
Using
import re
bool(re.search(r'^(?=.*hello)(?=.*world)', df.sum().sum())
Out[461]: True
Upvotes: 2