Reputation: 377
I need some help on declaring a regex. My inputs are like the following:
I need to extract word and before word and insert between ”_” in regex:python Input
Input
s2 = 'Some other medical terms and stuff diagnosis of R45.2 was entered for this patient. Where did Doctor Who go? Then xxx feea fdsfd'
# my regex pattern
re.sub(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,1}diagnosis", r"\1_", s2)
Desired Output:
s2 = 'Some other medical terms and stuff_diagnosis of R45.2 was entered for this patient. Where did Doctor Who go? Then xxx feea fdsfd'
Upvotes: 2
Views: 140
Reputation: 627082
You have no capturing group defined in your regex, but are using \1
placeholder (replacement backreference) to refer to it.
You want to replace 1+ special chars other than -
and '
before the word diagnosis
, thus you may use
re.sub(r"[^\w'-]+(?=diagnosis)", "_", s2)
See this regex demo.
Details
[^\w'-]+
- any non-word char excluding '
and _
(?=diagnosis)
- a positive lookahead that does not consume the text (does not add to the match value and thus re.sub
does not remove this piece of text) but just requires diagnosis
text to appear immediately to the right of the current location.Or
re.sub(r"[^\w'-]+(diagnosis)", r"_\1", s2)
See this regex demo. Here, [^\w'-]+
also matches those special chars, but (diagnosis)
is a capturing group whose text can be referred to using the \1
placeholder from the replacement pattern.
NOTE: If you want to make sure diagnosis
is matched as a whole word, use \b
around it, \bdiagnosis\b
(mind the r
raw string literal prefix!).
Upvotes: 2