Reputation: 923
I am about to develop a regex for a pattern given in a file I want to process.
The file contains several articles, which all follow a similar pattern:
I try to come up with a non-greedy regex, that accurately matches the start, body, and end of the article(s).
For 1-4 I have ^n\W+Dokument.+?[\r\n][\r\n]\W+Copyright[^\n]+\n
What is necessary for 5-6?
Do I actually need a dotall flag if I aim to use this regex as proposed to match the pattern several times in a file?
I have been on this all day. Can someone with a fresh mind show me the missing bits?
Cheers, Andrew
Upvotes: 0
Views: 66
Reputation: 13650
You can use the following:
- one optional line containing non-word characters followed by more characters and a new line
(\W+?(?:(?!All|Alle).)+?\n)?
- one line containing non-word characters followed by either "All Rights Reserved" or "Alle Rechte vorbehalten" and a new line
\W+(All Rights Reserved|Alle Rechte vorbehalten)\n
Combining 1-6:
^\W+Dokument.+?[\r\n][\r\n]\W+Copyright[^\n]+\n(\W+?(?:(?!All|Alle).)+?\n)?\W+?(?:All Rights Reserved|Alle Rechte vorbehalten)\n
See DEMO
Upvotes: 1