Reputation: 423
I am trying to parse text journals, and I am only interested in specific sections of text. I thought that I was doing fine until I noticed I was inadvertently identifying sections.
Suppose that I want to match the following section.
Section 7 - Delivering Terminal Diagnosis's
which may also show up as
Section 7. Delivering a Terminal Diagnosis
But I don't want to match anything if the words see or under precede my string like below.
see Section 7. Delivering a Terminal Diagnosis
or
filed under Section 7. Delivering a Terminal Diagnosis
should not match anything.
I tried using a negative look-ahead, but it only excludes the words, it doesn't throw out the entire match.
((?!see )Section[\s\\n]+7[\s+]+?[-:\\n\.]+?[\s+]+?(Delivering|Deliver)(.*terminal[\s+]+Diagnosis('s)?)?[\.]?)
I don't think that I am grasping the look-around concept properly. help?
Upvotes: 1
Views: 1711
Reputation: 70722
Try the following..
For whatever case you are using for matching, I would use r
in front of your regular expression. r
is Python’s raw string notation for regular expression patterns and to avoid escaping, and to avoid the fact of uppercase or lowercase to look for, use re.I
for case-insensitive matching.
Here's a possible solution using double Negative Lookbehind's.
(?<!see)(?<!under)\s+(section 7[\s.:-]+(?:deliver(?:ing)?).*?terminal\s+diagnosis(?:'s)?)
See live demo
By example of using the raw string notation and re.I
, this is what I meant.
matches = re.findall(r"(?<!see)(?<!under)\s+(section 7[\s.:-]+(?:deliver(?:ing)?).*?terminal\s+diagnosis(?:'s)?)", s, re.I)
print matches
Upvotes: 2
Reputation: 25954
Negative look-ahead does what it says: specifies a group that cannot match after your main expression. But you don't have anything before it.
Use negative lookbehind:
(?<!see|under)
in lieu of (?!see )
.
Other comments: you have a case error (terminal should be Terminal) and if you make your entire string "raw" by prepending it with an r like r'my string'
you don't need to double-escape characters like \n
.
Upvotes: 3