Reputation: 53
Trying to develop a regular expression to extract sentences that don't contain specific words. To keep it simple, IHere is a simple example:
Input: Sagittal scout images cervicothoracic : Mild-to-moderate multilevel spondylosis. Fracture present.
Desired Output: Fracture present.
Attempt #1
Regex:
[^.]*(?!cervi(c|x))[^.]*\.
Actual Output: Sagittal scout images cervicothoracic : Mild-to-moderate multilevel spondylosis. Fracture present.
Attempt #2:
Regex:
[^.]*[^(cervi(c|x))][^.]*\.
Actual Output: Sagittal scout images cervicothoracic : Mild-to-moderate multilevel spondylosis. Fracture present.
Can verify these results in https://regexr.com/
Upvotes: 3
Views: 997
Reputation: 18631
Use
(?<![^.])\s*((?:(?!cervi[cx])[^.])*\.)
See proof
Explanation
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
[^.] any character except: '.'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
cervi 'cervi'
--------------------------------------------------------------------------------
[cx] any character of: 'c', 'x'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[^.] any character except: '.'
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
) end of \1
Upvotes: 1