Batakj
Batakj

Reputation: 12743

Need help to resolve regular expression for Table of Contents

I have to parse a document which contains Table Of Contents. The generated Document contains some text which is not part of table of content e.g. header and footer.



2.1 some_text 100
2.1. some_text 100
some_text 100

I have written one regex for validating whether the text is part of table of content.


(\d+(\.\d*)?)(.*)(\d{1,3})

But, it passed all the above text. I want it to failed in 3rd text i.e. some_text 100.

Please help.

Upvotes: 1

Views: 43

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626932

You need to use an anchor ^ in multiline mode (start of line):

(?m)^(\d+(\.\d*)?)(.*)(\d{1,3})

See demo

You might even want to check if the number is at the end of the line with the $ anchor:

(?m)^\d+(?:\.\d*)?.*\d{1,3}$

Note I removed all capturing groups from the last regex to keep it clean. If you plan to use the captured texts, you can revert them.

Upvotes: 3

Related Questions