Reputation: 21
I am having some difficulty writing a regular expression function in python to identify strings where there are two words, in a specific order, with between 2 to 4 words between. For example, given the phrase 'fired job', I would want the string 'I was fired from my job' to be identified. My initial thought is that the best way to do this is to allow for 2 to 4 spaces between. I wrote the following, which does not seem to work, and would appreciate input.
re.search('(fired)(\s{2,4})(job)','I was fired from my job')
Upvotes: 0
Views: 1449
Reputation: 18611
In (fired)(\s{2,4})(job)
the \s{2,4}
matches 2-4 whitespace characters and does not allow for optional words between the fired
and job
substrings.
Use
\bfired(?:\s+\S+){0,2}\s+job\b
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
fired 'fired'
--------------------------------------------------------------------------------
(?: group, but do not capture (between 0 and 2
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
){0,2} end of grouping
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
job 'job'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
import re
s = 'I was fired from my job'
if re.search(r"\bfired(?:\s+\S+){0,2}\s+job\b", s):
print("Matched!")
else:
print("Not matched.")
Results: Matched!
Upvotes: 2