user213043
user213043

Reputation: 171

how do I include a boolean AND within a regex?

Is there a way to get single regex to satisfy this condition??

I am looking for a "word" that has three letters from the set MBIPI, any order, but MUST contain an I.

ie.

re.match("[MBDPI]{3}", foo) and "I" in foo

So this is the correct result (in python using the re module), but can I get this from a single regex?

>>> for foo in ("MBI", "MIB", "BIM", "BMI", "IBM", "IMB", "MBD"):
...     print foo,
...     print re.match("[MBDPI]{3}", foo) and "I" in foo
MBI True
MIB True
BIM True
BMI True
IBM True
IMB True
MBD False

with regex I know I can use | as a boolean OR operator, but is there a boolean AND equivalent?

or maybe I need some forward or backward lookup?

Upvotes: 12

Views: 20389

Answers (4)

Jan Heldal
Jan Heldal

Reputation: 176

with regex I know I can use | as a boolean OR operator, but is there a boolean AND equivalent?

A and B = not ( not A or not B) = (?![^A]|[^B])

A and B being expressions that actually may have members in common.

Upvotes: 5

Jens
Jens

Reputation: 25563

You can fake boolean AND by using lookaheads. According to http://www.regular-expressions.info/lookaround2.html, this will work for your case:

"\b(?=[MBDPI]{3}\b)\w*I\w*"

Upvotes: 5

cletus
cletus

Reputation: 625147

Or is about the only thing you can do:

\b(I[MBDPI]{2}|[MBDPI]I[MBDPI]|[MBDPI]{2}I)\b

The \b character matches a zero-width word boundary. This ensures you match something that is exactly three characters long.

You're otherwise running into the limits to what a regular language can do.

An alternative is to match:

\b[MBDPI]{3}\b

capture that group and then look for an I.

Edit: for the sake of having a complete answer, I'll adapt Jens' answer that uses Testing The Same Part of a String for More Than One Requirement:

\b(?=[MBDPI]{3}\b)\w*I\w*

with the word boundary checks to ensure it's only three characters long.

This is a bit more of an advanced solution and applicable in more situations but I'd generally favour what's easier to read (being the "or" version imho).

Upvotes: 3

Bart Kiers
Bart Kiers

Reputation: 170188

You could use lookahead to see if an I is present:

(?=[MBDPI]{0,2}I)[MBDPI]{3}

Upvotes: 2

Related Questions