Need help to resolve regular expression for Table of Contents

Question

I have to parse a document which contains Table Of Contents. The generated Document contains some text which is not part of table of content e.g. header and footer.



2.1 some_text 100
2.1. some_text 100
some_text 100

I have written one regex for validating whether the text is part of table of content.


(\d+(\.\d*)?)(.*)(\d{1,3})

But, it passed all the above text. I want it to failed in 3rd text i.e. some_text 100.

Please help.

Wiktor Stribiżew · Accepted Answer

You need to use an anchor ^ in multiline mode (start of line):

(?m)^(\d+(\.\d*)?)(.*)(\d{1,3})

See demo

You might even want to check if the number is at the end of the line with the $ anchor:

(?m)^\d+(?:\.\d*)?.*\d{1,3}$

Note I removed all capturing groups from the last regex to keep it clean. If you plan to use the captured texts, you can revert them.

Need help to resolve regular expression for Table of Contents

Answers (1)

Related Questions