Reputation: 156
I have the content of an English dictionary at hand and I want to find the definition for a specific example sentence.
For example, I want to find the definition for "example sentence 2b". In my opinion, the code may look lile this:
re.search(r'\d\. ([^\n]*?)\n(?!.*\d\. ).*?example sentence 2b', content, flags=re.DOTALL)
Here, the "content" is as follows:
1. definition1
example sentence 1a
example sentence 1b
2. definition2
example sentence 2a
example sentence 2b
3. definition3
example sentence 3a
example sentence 3b
Live test here - https://regex101.com/r/UOz6DA/1/
As you can see in the live test, I didn't get desired match - "definition2". I really don't know why.
PS: I used (?!.*\d\. ).*
based on this post - regex how to exclude specific characters or string anywhere
Upvotes: 0
Views: 128
Reputation: 156
The reason it won't match is due to the existence of "3. ", even though this substring is after "example sentence 2b".
For a simpler example, if you use the "s" flag in this live demo, the second line won't match any more because of the "chocolate" substring in the third line.
Upvotes: 0
Reputation: 19641
You may use the following pattern without the re.DOTALL
flag:
^\d+\. (.*)(?:\n(?!\d+\. ).*)*\nexample sentence 2b
Breakdown:
^
- Beginning of line.\d+\.
- Match one or more digits, then a dot, and a space character.(.*)
- Match zero or more characters and capture them in group 1.(?:
- Beginning of a non-capturing group.
\n(?!\d+\. )
- Match a line-break that is not followed by a "definition line"..*
- Match zero or more characters.)
- Close the non-capturing group.*?
- Match the previous group between zero and unlimited times (lazy).\nexample sentence 2b
- Match a linebreak character followed by the target sentence.Upvotes: 2