Reputation: 206
I have a very large text file with several entries like this:
-------------------------------------
LOTS OF
MULTILINE
TEXT
*************************************
MORE
MULTILINE
TEXT
*************************************
EVEN-MORE-TEXT-SOMETIMES-WITH-DASHES
*************************************
-------------------------------------
2ND LOT OF
MULTILINE
TEXT
*************************************
MORE
MULTILINE
TEXT FOR 2ND LOT
*************************************
EVEN-MORE-TEXT-FOR-2ND
*************************************
Note that these are only two entries, I don't care about the asterisks, but the text that follows the dashed line.
I want to get a capture group with all the text in each entry so that I can analyze it later line by line.
I can capture the first entry with an expression like this:
/-{37}\s*([\s\S]+)-{37}/gm
But I'm having trouble running the capture group several times because I don't have a clear terminator for the groups (since the *{37} appears several times)
Here's a regex 101 example:
https://regex101.com/r/XZQ5h6/1
How can I capture the text after the dashed line but before the next dashed line or the end of the file?
Edit: So to make my question clearer, the capture group I would expect for the first entry would be.
LOTS OF
MULTILINE
TEXT
*************************************
MORE
MULTILINE
TEXT
*************************************
EVEN-MORE-TEXT-SOMETIMES-WITH-DASHES
*************************************
I also happen to have some dashes in the text, so I have edited the example. Ideally, I want an array of capture groups with just the content in the entries.
Upvotes: 2
Views: 159
Reputation: 163362
If you want to capture the 2 parts in the example data in a capture group:
^-{37}\s*^((?:(?!--).*(?:\r?\n|$))+)
In parts, the pattern matches:
^
Start of string-{37}
Match 37 times a -
char\s*^
Match optional whitespace chars, and assert start of string (note to use the multiline flag due to the anchors)(
Capture group 1
(?:
Non capture group
(?!--)
Negative lookahead, assert not --
at the start of the string (or make it more specific like (?!-{37}\r?\n)
).*(?:\r?\n|$)
Match the whole line followed by a newline or assert the end of the string)+
Close non capture group 1+ times)
Close group 1Or a bit shorter, but then it would include the leading whitespace chars in the match:
^-{37}((?:\r?\n(?!--).*)*)
Upvotes: 0
Reputation: 854
This regex will match both entries:
/-{37}[^-]+/gm
Try it out in regex101.
Upvotes: 0
Reputation: 785196
You can use this regex:
-{37}\R+((?:.+\R)+)
RegEx Detail;
-{37}
: Match hyphen of 37 in length\R+
: Match 1+ of line breaks(
: Start capture group
(?:.+\R)+
: Match a line of 1+ character followed by a line break. Repeat this group 1+ times to match multiple of these lines)
: End capture groupUpvotes: 1