afaolek
afaolek

Reputation: 8821

Regex to match words following a pattern

I don't know how to phrase the title, so I will be doing the explaining here. I have sample text like this:

Line 1
Contents and text in the line.
It's a paragraph.

Line 2
Those for this line.
Another paragraph

Line 3
More contents.

Line 4
More contents...

How do I extract the paragraphs? I tried this:
(?:Line \d{1,3})(.*?)(?:Line \d{1,3})

This matched odd numbered paragraphs, like paragraphs 1, 3, 5 etc. I'm working with C# but this is regex, so I don't think there will be any major difference.

Upvotes: 0

Views: 63

Answers (2)

Ofir Winegarten
Ofir Winegarten

Reputation: 9365

If you want to select only the text without the "Line \d" pattern, you can use this.
This is a fine tuning on your regex:

(?:Line \d+\n)(.*?)(?=\nLine \d+\n|$)

Check It

Because we cant use the wild card in look behind, i used like you did the non-capturing group, and choosing the text until we hit the Line pattern again or end of file.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522741

Here is a pattern which should work:

(Line \d+.*?)(?=Line|$)

This says to match a paragraph beginning with Line, followed by anything up until hitting the start of the next paragraph (i.e. Line) or the end of the text. The end of the text would occur for the last paragraph.

You would also need to run this regex in dot all mode, or, if not, replace the .* with [\s\S]*.

Demo

Upvotes: 1

Related Questions