Ray Salemi
Ray Salemi

Reputation: 5913

How to exclude lines with ... in regular expression

I have the following table of contents and sections in my file:

1.2 Purpose .................... 8  
1.3 System Overview ............ 8  
1.4 Document Overview .......... 8  
1.5 Definitions and Acronyms ......... 9  
2.1.3.3.8   FOO 
2.1.3.3.9  BAR 
2.1.4 TEST

I'd like to extract the section names and ignore the lines that are part of the table of contents.

I've been trying this regular expression:

^((?:\d{1,2}\.)+(?:\d{1,2})+)\s.+(?!\.\.\.).*$

However, I keep capturing the table of contents lines.

How can I exclude the lines with the .... strings?

Thanks!

Upvotes: 1

Views: 71

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295423

The problem here was that you were only excluding .s at a very specific place; your negative lookahead match didn't go beyond the position it was placed in. Consider instead:

^(\d{1,2}(?:\.\d{1,2})*)\s*[^.]*(?!.*\.{3}).*$
#                                  ^^

...the characters with the carrot below them are critical: They make the negative lookahead apply not only at that specific point, but at anywhere after it as well.

Upvotes: 3

Related Questions