Tim
Tim

Reputation: 99438

Regex for existence of some words whose order doesn't matter

I would like to write a regex for searching for the existence of some words, but their order of appearance doesn't matter.

For example, search for "Tim" and "stupid". My regex is Tim.*stupid|stupid.*Tim. But is it possible to write a simpler regex (e.g. so that the two words appear just once in the regex itself)?

Upvotes: 25

Views: 20098

Answers (2)

Unihedron
Unihedron

Reputation: 11041

See this regex:

/^(?=.*Tim)(?=.*stupid).+/

Regex explanation:

  • ^ Asserts position at start of string.
  • (?=.*Tim) Asserts that "Tim" is present in the string.
  • (?=.*stupid) Asserts that "stupid" is present in the string.
  • .+Now that our phrases are present, this string is valid. Go ahead and use .+ or - .++ to match the entire string.

To use lookaheads more exclusively, you can add another (?=.*<to_assert>) group. The entire regex can be simplified as /^(?=.*Tim).*stupid/.

See a regex demo!

>>> import re
>>> str ="""
... Tim is so stupid.
... stupid Tim!
... Tim foobar barfoo.
... Where is Tim?"""
>>> m = re.findall(r'^(?=.*Tim)(?=.*stupid).+$', str, re.MULTILINE)
>>> m
['Tim is so stupid.', 'stupid Tim!']
>>> m = re.findall(r'^(?=.*Tim).*stupid', str, re.MULTILINE)
>>> m
['Tim is so stupid.', 'stupid Tim!']

Read more:

Upvotes: 55

hwnd
hwnd

Reputation: 70732

You can use Positive Lookahead to achieve this. The lookahead approach is nice for matching strings that contain both substrings regardless of order.

pattern = re.compile(r'^(?=.*Tim)(?=.*stupid).*$')

Example:

>>> s = '''Hey there stupid, hey there Tim
Hi Tim, this is stupid
Hi Tim, this is great'''
...
>>> import re
>>> pattern = re.compile(r'^(?=.*Tim)(?=.*stupid).*$', re.M)
>>> pattern.findall(s)

# ['Hey there stupid, hey there Tim', 'Hi Tim, this is stupid']

Upvotes: 7

Related Questions