Reputation: 8821
I don't know how to phrase the title, so I will be doing the explaining here. I have sample text like this:
Line 1
Contents and text in the line.
It's a paragraph.Line 2
Those for this line.
Another paragraphLine 3
More contents.Line 4
More contents...
How do I extract the paragraphs? I tried this:
(?:Line \d{1,3})(.*?)(?:Line \d{1,3})
This matched odd numbered paragraphs, like paragraphs 1, 3, 5 etc. I'm working with C# but this is regex, so I don't think there will be any major difference.
Upvotes: 0
Views: 63
Reputation: 9365
If you want to select only the text without the "Line \d" pattern, you can use this.
This is a fine tuning on your regex:
(?:Line \d+\n)(.*?)(?=\nLine \d+\n|$)
Because we cant use the wild card in look behind, i used like you did the non-capturing group, and choosing the text until we hit the Line pattern again or end of file.
Upvotes: 1
Reputation: 522741
Here is a pattern which should work:
(Line \d+.*?)(?=Line|$)
This says to match a paragraph beginning with Line
, followed by anything up until hitting the start of the next paragraph (i.e. Line
) or the end of the text. The end of the text would occur for the last paragraph.
You would also need to run this regex in dot all mode, or, if not, replace the .*
with [\s\S]*
.
Upvotes: 1