Reputation: 55
I'm new to regular expressions and try to extract text in a string which starts with a value in brackets on the beginning of a new line until the next string in brackets.
My string:
(1x) cat
dog
(2) ele(4)phant
tiger
(x) fish
bird
I need to get:
- "1x" and "cat\r\ndog"
- "2" and "ele(4)phant\r\ntiger"
- "x" and "fish\r\nbird"
My regex:
(\r\n)*(\((.*?)\))(.*)
This gets me:
Match 1
Full match 0-8 `(1x) cat`
Group 2. 0-4 `(1x)`
Group 3. 1-3 `1x`
Group 4. 4-8 ` cat`
Match 2
Full match 13-28 `(2) ele(4)phant`
Group 2. 13-16 `(2)`
Group 3. 14-15 `2`
Group 4. 16-28 ` ele(4)phant`
Match 3
Full match 35-44 `(x) fish `
Group 2. 35-38 `(x)`
Group 3. 36-37 `x`
Group 4. 38-44 ` fish `
The problem is that my regex seems to stop at the end of the line so the strings on the new line (dog, tiger, bird) are missing.
Do you have an idea how to also get the content of the next lines until the next match?
Upvotes: 1
Views: 118
Reputation: 626689
You may use
'~^\(([^()]*)\)(.*(?:\R(?!\([^()]*\)).*)*)~m'
See the regex demo
Details
^
- start of a line (due to m
modifier, ^
matches the start of a line rather than the start of the whole string)
\(
- a (
([^()]*)
- Group 1:
[^()]*
- 0+ chars other than (
and )
(you might use your .*?
here, if you do not want to overflow across lines, and want to match (
inside (...)
)\)
- a )
char(.*(?:\R(?!\([^()]*\)).*)*)
- Group 2:
.*
- the rest of the line(?:\R(?!\([^()]*\)).*)*
- 0+ sequences of
\R(?!\([^()]*\))
- line break not followed with (...)
substring.*
- rest of the lineUpvotes: 1