Reputation: 470
I am trying to capture the text in headers/subsections and the bullet list that follows it with:
re.finditer('(?!^\* )(?P<description>^.+?)(?P<items>^\* .+?^)(?!^\* )',
text, flags=re.DOTALL | re.MULTILINE)
with this sample text:
Header A
Subheader A
* Item A
* Item B
* Item C
Header B
Subheader B
Description B
* Item 1
* Item 2
* Item 3
Random Header C
* Item X
* Item Y
* Item Z
The expression works except on Random Header C
and its bullet list. A workaround is to add two trailing line breaks \n\n
after * Item F
. Any idea how to match the last section or if there is a better method for doing this?
https://regex101.com/r/yG7sJ6/1
Upvotes: 2
Views: 1315
Reputation: 785481
You can use this regex to capture missing items:
(?P<description>^.+?)(?P<items>(?:^\* [^\n]+(?:\n|$))+)
Upvotes: 2