Reputation: 17687
Given the following text:
[chat] somebody: this is some text that needs to be matched. it continues on this line with four spaces at the bottom and its also on this line. It should not match this line [chat] somebody: this is more text, but only one line. This should not be matched
Each message I'm trying to process can extend onto the next line, but if it does so, it the next line will start with four spaces. So far I've come up with the following regular expression, which stops at a newline, but I can figure out how to get the match to continue.
(^|\r|\n)\[([a-z]+)\]\s+([a-z]+):?\s+([^\n]+)
I've also tried:
(^|\r|\n)\[([a-z]+)\]\s+([a-z]+):?\s+(.+?\n[^ ]+)
Which partially works, but still matches the first word on the incorrect lines.
Any suggestions as to how I could do this with a regular expression?
Upvotes: 2
Views: 3560
Reputation: 163557
You can match the start of the pattern with the square brackets, and then optionally match 4 spaces at the start of the next line followed by the rest of the line.
Note that \s
can also match a newline.
^\[([a-z]+)][^\S\r\n]+([a-z]+):(.*(?:\r?\n {4}.*)*)
^
Start of string\[([a-z]+)]
Capture chars a-z between square brackets in group 1[^\S\r\n]+
Match 1+ whitespace chars without a newline([a-z]+):
Capture 1+ chars a-z in group 2 and then match :
(
Capture group 3
.*
Match the rest of the line(?:\r?\n {4}.*)*
Optionally repeat matching all lines that start with 4 spaces)
Close group 3A bit broader match then a-z could be using \w
instead to match a word character or using a negated character class:
^\[([^][\r\n]*)][^\S\r\n]+([^:\r\n]+):(.*(?:\r?\n {4}.*)*)
Upvotes: 3
Reputation: 425278
Match, using the DOTALL flag, until the next character is a newline that itself is not followed by 4 spaces:
(?s)^\[[a-z]+].*?(?=\n(?! ))
See live demo.
If you regex engine doesn’t support flags in the expression, use the m
modifier, eg
/^\[[a-z]+].*?(?=\n(?! ))/m
Upvotes: 1