mickyjtwin
mickyjtwin

Reputation: 4990

Regex match for text

I am tring to create a regex to match the content between numbered lists, e.g. with the following content:

1) Text for part 1 2) Text for part 2 3) Text for part 3

Upvotes: 1

Views: 221

Answers (3)

Michał Turecki
Michał Turecki

Reputation: 3167

You should keep in mind text after number and bracket might be any text, this would find your substrings:

\d\).+?(?=\d\)|$)

EDIT:

To get rid of whitespace and return only text without a number, get group 1 from following match:

\d\)\w*(.+?)(?=\d\)|$)

To get number in group(1) and text in group(2) use this:

(\d)\)\w*(.+?)(?=\d\)|$)

Upvotes: 0

Daniel Gehriger
Daniel Gehriger

Reputation: 7468

I'd suggest the following (PCRE):

(?:\d+\)\s*(.*?))*$
  • The inner part \d+\)\s* matches the list number and the closing brace, followed by optional white space(s).

  • (.*?) matches the list text, but in a non-greedy manner (otherwise, it would also match the next list item).

  • The enclosing (?: )*$ then matches the above zero or more times, until the end of the input.

Upvotes: 0

The following PCRE should work, assuming you haven't got any thing formatted like "1)" or the like inside of the sections:

\d+\)\s*(.*?)\s*(?=\d+\)|$)

Explanation:

  • \d+\) gives a number followed by a ).
  • \s* matches the preceding whitespace.
  • (.*?) captures the contents non-greedily.
  • \s* matches the trailing whitespace.
  • (?=\d+\)|$) ensures that the match is followed by either the start of a new section or the end of the text.

Note, it doesn't enforce that they must be ascending or anything like that, so it'd match the following text as well:

4) Hello there 1) How are you? 5) Good.

Upvotes: 2

Related Questions