Kyle
Kyle

Reputation: 17687

Regex to match until newline, unless the next line starts with four spaces

Given the following text:

[chat] somebody: this is some text that needs to be matched.
    it continues on this line with four spaces at the bottom
    and its also on this line.
It should not match this line
 
[chat] somebody: this is more text, but only one line.
This should not be matched

Each message I'm trying to process can extend onto the next line, but if it does so, it the next line will start with four spaces. So far I've come up with the following regular expression, which stops at a newline, but I can figure out how to get the match to continue.

(^|\r|\n)\[([a-z]+)\]\s+([a-z]+):?\s+([^\n]+)

I've also tried:

(^|\r|\n)\[([a-z]+)\]\s+([a-z]+):?\s+(.+?\n[^    ]+)

Which partially works, but still matches the first word on the incorrect lines.

Any suggestions as to how I could do this with a regular expression?

Upvotes: 2

Views: 3560

Answers (2)

The fourth bird
The fourth bird

Reputation: 163557

You can match the start of the pattern with the square brackets, and then optionally match 4 spaces at the start of the next line followed by the rest of the line.

Note that \s can also match a newline.

^\[([a-z]+)][^\S\r\n]+([a-z]+):(.*(?:\r?\n {4}.*)*)
  • ^ Start of string
  • \[([a-z]+)] Capture chars a-z between square brackets in group 1
  • [^\S\r\n]+ Match 1+ whitespace chars without a newline
  • ([a-z]+): Capture 1+ chars a-z in group 2 and then match :
  • ( Capture group 3
    • .* Match the rest of the line
    • (?:\r?\n {4}.*)* Optionally repeat matching all lines that start with 4 spaces
  • ) Close group 3

Regex demo

A bit broader match then a-z could be using \w instead to match a word character or using a negated character class:

^\[([^][\r\n]*)][^\S\r\n]+([^:\r\n]+):(.*(?:\r?\n {4}.*)*)

Regex demo

Upvotes: 3

Bohemian
Bohemian

Reputation: 425278

Match, using the DOTALL flag, until the next character is a newline that itself is not followed by 4 spaces:

(?s)^\[[a-z]+].*?(?=\n(?!    ))

See live demo.

If you regex engine doesn’t support flags in the expression, use the m modifier, eg

/^\[[a-z]+].*?(?=\n(?!    ))/m

Upvotes: 1

Related Questions