Reputation: 4990
I am tring to create a regex to match the content between numbered lists, e.g. with the following content:
1) Text for part 1 2) Text for part 2 3) Text for part 3
Upvotes: 1
Views: 221
Reputation: 3167
You should keep in mind text after number and bracket might be any text, this would find your substrings:
\d\).+?(?=\d\)|$)
EDIT:
To get rid of whitespace and return only text without a number, get group 1 from following match:
\d\)\w*(.+?)(?=\d\)|$)
To get number in group(1) and text in group(2) use this:
(\d)\)\w*(.+?)(?=\d\)|$)
Upvotes: 0
Reputation: 7468
I'd suggest the following (PCRE):
(?:\d+\)\s*(.*?))*$
The inner part \d+\)\s*
matches the list number and the closing brace, followed by optional white space(s).
(.*?)
matches the list text, but in a non-greedy manner (otherwise, it would also match the next list item).
The enclosing (?: )*$
then matches the above zero or more times, until the end of the input.
Upvotes: 0
Reputation: 50858
The following PCRE should work, assuming you haven't got any thing formatted like "1)" or the like inside of the sections:
\d+\)\s*(.*?)\s*(?=\d+\)|$)
Explanation:
\d+\)
gives a number followed by a )
.\s*
matches the preceding whitespace.(.*?)
captures the contents non-greedily.\s*
matches the trailing whitespace.(?=\d+\)|$)
ensures that the match is followed by either the start of a new section or the end of the text.Note, it doesn't enforce that they must be ascending or anything like that, so it'd match the following text as well:
4) Hello there 1) How are you? 5) Good.
Upvotes: 2