Regex to match until newline, unless the next line starts with four spaces

Question

Given the following text:

[chat] somebody: this is some text that needs to be matched.
    it continues on this line with four spaces at the bottom
    and its also on this line.
It should not match this line
 
[chat] somebody: this is more text, but only one line.
This should not be matched

Each message I'm trying to process can extend onto the next line, but if it does so, it the next line will start with four spaces. So far I've come up with the following regular expression, which stops at a newline, but I can figure out how to get the match to continue.

(^|\r|\n)$$([a-z]+)$$\s+([a-z]+):?\s+([^\n]+)

I've also tried:

(^|\r|\n)$$([a-z]+)$$\s+([a-z]+):?\s+(.+?\n[^    ]+)

Which partially works, but still matches the first word on the incorrect lines.

Any suggestions as to how I could do this with a regular expression?

The fourth bird · Accepted Answer

You can match the start of the pattern with the square brackets, and then optionally match 4 spaces at the start of the next line followed by the rest of the line.

Note that \s can also match a newline.

^\[([a-z]+)][^\S\r\n]+([a-z]+):(.*(?:\r?\n {4}.*)*)

^ Start of string
\[([a-z]+)] Capture chars a-z between square brackets in group 1
[^\S\r\n]+ Match 1+ whitespace chars without a newline
([a-z]+): Capture 1+ chars a-z in group 2 and then match :
( Capture group 3
- .* Match the rest of the line
- (?:\r?\n {4}.*)* Optionally repeat matching all lines that start with 4 spaces
) Close group 3

Regex demo

A bit broader match then a-z could be using \w instead to match a word character or using a negated character class:

^\[([^][\r\n]*)][^\S\r\n]+([^:\r\n]+):(.*(?:\r?\n {4}.*)*)

Regex demo

Regex to match until newline, unless the next line starts with four spaces

Answers (2)

Related Questions