aeatest
aeatest

Reputation: 21

Write regular expression in python to match two specific words, allowing for set number of words between

I am having some difficulty writing a regular expression function in python to identify strings where there are two words, in a specific order, with between 2 to 4 words between. For example, given the phrase 'fired job', I would want the string 'I was fired from my job' to be identified. My initial thought is that the best way to do this is to allow for 2 to 4 spaces between. I wrote the following, which does not seem to work, and would appreciate input.

re.search('(fired)(\s{2,4})(job)','I was fired from my job')

Upvotes: 0

Views: 1449

Answers (1)

Ryszard Czech
Ryszard Czech

Reputation: 18611

In (fired)(\s{2,4})(job) the \s{2,4} matches 2-4 whitespace characters and does not allow for optional words between the fired and job substrings.

Use

\bfired(?:\s+\S+){0,2}\s+job\b

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  fired                    'fired'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (between 0 and 2
                           times (matching the most amount
                           possible)):
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  ){0,2}                   end of grouping
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  job                      'job'
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

Python code:

import re
s = 'I was fired from my job'
if re.search(r"\bfired(?:\s+\S+){0,2}\s+job\b", s):
    print("Matched!")
else:
    print("Not matched.")

Results: Matched!

Upvotes: 2

Related Questions