Reputation: 5913
I have the following table of contents and sections in my file:
1.2 Purpose .................... 8
1.3 System Overview ............ 8
1.4 Document Overview .......... 8
1.5 Definitions and Acronyms ......... 9
2.1.3.3.8 FOO
2.1.3.3.9 BAR
2.1.4 TEST
I'd like to extract the section names and ignore the lines that are part of the table of contents.
I've been trying this regular expression:
^((?:\d{1,2}\.)+(?:\d{1,2})+)\s.+(?!\.\.\.).*$
However, I keep capturing the table of contents lines.
How can I exclude the lines with the .... strings?
Thanks!
Upvotes: 1
Views: 71
Reputation: 295423
The problem here was that you were only excluding .
s at a very specific place; your negative lookahead match didn't go beyond the position it was placed in. Consider instead:
^(\d{1,2}(?:\.\d{1,2})*)\s*[^.]*(?!.*\.{3}).*$
# ^^
...the characters with the carrot below them are critical: They make the negative lookahead apply not only at that specific point, but at anywhere after it as well.
Upvotes: 3