BoobaGump
BoobaGump

Reputation: 535

Regex Matching line by line and stop when a specific string appears

I'm using rewith python and I can't get my logic work around this problem. My text is the following:

(...)
 Zelite and FOS ont are limiting small bowel disturbance. 
Indications :
chronical kidney disease (IRC)
management of urolithiasis and low tract urinary syndrome
hepatic encephalitis
management of acidophils urinary stone : purine, cystine…
Contraindication :
pregnancy, lactation, growth
Length of the treatment : ... 
(...)

I just want to get things between Indications and Contraindication. Each row would be another group.

So far I'm nearly satisfied but that's not exactly it :

([I,i]ndication[s]*\s*\:{0,1})(\s*.*\n)*? Contraindication

which gives me :

Indications :
    chronical kidney disease (IRC)
    management of urolithiasis and low tract urinary syndrome
    hepatic encephalitis
    management of acidophils urinary stone : purine, cystine…
 Contraindication

I would like to get rid of "Contraindication" but negative lookahead doesn't seem to work with :?. I don't know why. A .replace("Contraindication","") is always possible but I'm looking for proper Regex solution.
I don't know if that possible with regex but it's that possible to have a group for each line (between indications and contraindications) without knowing in advance how many lines there will be ?

You can check what I did here on the Regex simulator

Have a great day

Upvotes: 0

Views: 36

Answers (1)

emsimpson92
emsimpson92

Reputation: 1778

A negative lookahead will match a string that isn't followed by whatever is in the lookahead. A positive lookahead is what you want. This will match a string that is followed by whatever is in the lookahead, without including the lookahead in your match. In this case, you can do this:

(?s)(?<=[iI]ndications :).*(?=Contraindication)

As you can see here, it captures exactly what you want.

To break this down for you, (?s) enables the single line flag, (?<=[iI]ndications :) matches a string that is preceded by indications : or Indications :

.* captures everything in between

and (?=Contraindication) means the string must be followed by Contraindication

Neither the lookahead or the lookbehind are included in the match. If you want to also include the word indications, just remove the (?<=) surrounding it.

Upvotes: 2

Related Questions