scottwernervt
scottwernervt

Reputation: 470

Regular expression to match headers and sub headers followed by a bullet list

I am trying to capture the text in headers/subsections and the bullet list that follows it with:

re.finditer('(?!^\* )(?P<description>^.+?)(?P<items>^\* .+?^)(?!^\* )', 
            text, flags=re.DOTALL | re.MULTILINE)

with this sample text:

Header A
Subheader A
* Item A
* Item B
* Item C
Header B
Subheader B
Description B
* Item 1
* Item 2
* Item 3
Random Header C
* Item X
* Item Y
* Item Z

The expression works except on Random Header C and its bullet list. A workaround is to add two trailing line breaks \n\n after * Item F. Any idea how to match the last section or if there is a better method for doing this?

https://regex101.com/r/yG7sJ6/1

Upvotes: 2

Views: 1315

Answers (1)

anubhava
anubhava

Reputation: 785481

You can use this regex to capture missing items:

(?P<description>^.+?)(?P<items>(?:^\* [^\n]+(?:\n|$))+)

RegEx Demo

Upvotes: 2

Related Questions