Mars
Mars

Reputation: 351

How to filter rows containing specific string values with an AND operator

My question is kind of an extension of the question answered quite well in this link:

I've posted the answer here below where the strings are filtered out when they contain the word "ball":

In [3]: df[df['ids'].str.contains("ball")]
Out[3]:
     ids     vals
0  aball     1
1  bball     2
3  fball     4

Now my question is: what if I have long sentences in my data, and I want to identify strings with the words "ball" AND "field"? So that it throws away data that contains the word "ball" or "field" when only one of them occur, but keeps the ones where the string has both words in it.

Upvotes: 6

Views: 7380

Answers (4)

BENY
BENY

Reputation: 323316

If you have more than 2 , You can using this ..(Notice the speed is not as good as foxyblue's method )

l = ['ball', 'field']
df.ids.apply(lambda x: all(y in x for y in l))

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

Yet another RegEx approach:

In [409]: df
Out[409]:
               ids
0   ball and field
1  ball, just ball
2      field alone
3  field and ball

In [410]: pat = r'(?:ball.*field|field.*ball)'

In [411]: df[df['ids'].str.contains(pat)]
Out[411]:
               ids
0   ball and field
3  field and ball

Upvotes: 0

Zero
Zero

Reputation: 76947

You could use np.logical_and.reduce and str.contains takes care of multiple words.

df[np.logical_and.reduce([df['ids'].str.contains(w) for w in ['ball', 'field']])]

In [96]: df
Out[96]:
             ids
0  ball is field
1     ball is wa
2  doll is field

In [97]: df[np.logical_and.reduce([df['ids'].str.contains(w) for w in ['ball', 'field']])]
Out[97]:
             ids
0  ball is field

Upvotes: 0

s3bw
s3bw

Reputation: 3049

df[df['ids'].str.contains("ball")]

Would become:

df[df['ids'].str.contains("ball") & df['ids'].str.contains("field")]

If you are into neater code:

contains_balls = df['ids'].str.contains("ball")
contains_fields = df['ids'].str.contains("field")

filtered_df = df[contains_balls & contains_fields]

Upvotes: 5

Related Questions