Reputation: 39
Hello im kinda new to regex and have a small, maybe simple question.
I have the given text:
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
17.11.2020 15:32 typical Pat. seems sleeping
Additional test
My current regex (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*)
matches only till sleeping but reates 3 matches correctly.
But i need the Additional test
text also in the second group.
i tried something like (\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?([,.:\w\s]*)
but now i have only one huge match because the second group takes everything until the end.
How can i match everything until a new line with a date starts and create a new match from there on?
Upvotes: 3
Views: 68
Reputation: 163277
You are using \s
repeatedly using the *
quantifier with the character class [,.:\w\s]*
and \s
also matches newlines and will match too much.
You can just match the rest of the line using (.*\r?\n.*)
which would not match a newline, then match a newline and the next line in the same group.
^(\d{2}.\d{2}.\d{4}\s\d{2}:\d{2})\s?(.*\r?\n.*)
If multiple lines can follow, match all following lines that do not start with a date like pattern.
^(\d{2}\.\d{2}\.\d{4})\s*(.*(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*)
Explanation
^
Start of the string(
Capture group1\d{2}\.\d{2}\.\d{4}
Match a date like pattern)
Close group 1\s*
Match 0+ whitespace chars (Or match whitespace chars without newlines [^\S\r\n]*
)(
Capture group 2
.*
Match the whole line(?:\r?\n(?!\d{2}\.\d{2}\.\d{4}).*)*
Optionally repeat matching the whole line if it does not start with a date like pattern)
Close group 2Upvotes: 1
Reputation: 626748
If you are sure there is only one additional line to be matched you can use
(?m)^(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})\s*(.*(?:\n.*)?)
See the regex demo. Details:
(?m)
- a multiline modifier^
- start of a line(\d{2}\.\d{2}\.\d{4}\s\d{2}:\d{2})
- Group 1: a datetime string\s*
- zero or more whitespaces(.*(?:\n.*)?)
- Group 2: any zero or more chars other than a newline char as many as possible and then an optional line, a newline followed with any zero or more chars other than a newline char as many as possible.If there can be any amount of lines, you may consider
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})[\p{Zs}\t]*(?s)(.*?)(?=\n\d{2}\.\d{2}\.\d{4}|\z)
See this regex demo. Here,
(?m)^(\d{2}\.\d{2}\.\d{4}[\p{Zs}\t]\d{2}:\d{2})
- matches the same as above, just \s
is replaced with [\p{Zs}\t]
that only matches horizontal whitespace[\p{Zs}\t]*
- 0+ horizontal whitespace chars(?s)
- now, .
will match any chars including a newline(.*?)
- Group 2: any zero or more chars, as few as possible(?=\n\d{2}\.\d{2}\.\d{4}|\z)
- up to the leftmost occurrence of a newline, followed with a date string, or up to the end of string.Upvotes: 1