Reputation: 477
I created regex for matching such pattern:
<some text>
yyyy.MM.dd SOME TEXT decimal decimal
yyy.MM.dd
some sentence
some sentence
some sentence (it can have from 1 to n lines of comments) but the last line that does not start with yyyy.MM.dd SOME TEXT decimal decimal)
yyyy.MM.dd SOME TEXT decimal decimal
yyy.MM.dd
some sentence
some sentence
some sentence
...
<some text>
The regex:
((\d{4}\.\d{2}\.\d{2})\s([a-zA-Z\s]{0,})\s(\-{0,1}((\d{1}\,\d{2})|(\d{1,}\ \d{3}\,\d{2})))\s(\-{0,1}((\d{1}\,\d{2})|(\d{1,}\ \d{3}\,\d{2}))\s)(\d{4}\.\d{2}\.\d{2}))
Which matches only first 2 lines. I can't match multiline sentences until next yyyy.MM.dd SOME TEXT decimal decimal
(exclusively)
This is the test data for matching:
2020.11.01 SOME TEXT -17,30 83 016,86
2020.10.30
Some text that should be
matched 20.01.2020 as
multiline text
until now
2020.11.01 SOME TEXT -27,30 81 016,86
2020.10.30
Some text that should be
matched 20.01.2020 as
multiline text
until now
...
it should match like this:
1.
2020.11.01 SOME TEXT -17,30 83 016,86
2020.10.30
Some text that should be
matched 20.01.2020 as
multiline text
until now
2020.11.01 SOME TEXT -27,30 81 016,86
2020.10.30
Some text that should be
matched 20.01.2020 as
multiline text
until now
For me it matches like this:
1.
2020.11.01 SOME TEXT -17,30 83 016,86
2020.10.30
2020.11.01 SOME TEXT -27,30 81 016,86
2020.10.30
How can I match from 1 to many multiline lines WITHOUT 'yyyy.MM.dd SOME TEXT decimal decimal' on the next line?
Upvotes: 1
Views: 66
Reputation: 163207
For the example data, you can match the first 2 lines with a date like pattern, followed by all the lines that do not start with a datelike pattern.
Note that \d{4}\.\d{2}\.\d{2}
does not validate a date itself. To get a more precise match, this page has more detailed examples.
^\d{4}\.\d{2}\.\d{2} .*\r?\n\d{4}\.\d{2}\.\d{2}\b.*(?:\r?\n(?!\d{4}\.\d{2}\.\d{2}\b).*)*
Or if you first want to match all lines that start with a datelike pattern incase of 1 or more, followed with lines that do not:
^\d{4}\.\d{2}\.\d{2} \S.*(?:\r?\n\d{4}\.\d{2}\.\d{2}\b.*)+(?:\r?\n(?!\d{4}\.\d{2}\.\d{2}\b).*)*
Explanation
^
Start of the string\d{4}\.\d{2}\.\d{2} \S.*
match a datelike pattern followed by a space, at least a non whitespace char (For SOME TEXT
in the example) and the rest of the line(?:\r?\n\d{4}\.\d{2}\.\d{2}\b.*)+
Repeat 1+ times matches lines that start with a datelike pattern(?:
Non capture group (to repeat as a whole)
\r?\n
Match a newline(?!\d{4}\.\d{2}\.\d{2}\b)
Assert not a datelike format directly to the right.*
If the previous assertion it true, match the whole line)*
Optionally repeat all lines that do not start with a datelike pattern (If there should be at least 1 line, change the quantifier to +
)Upvotes: 2