Reputation: 18963
The naive way to accomplish this would be:
import re
re.split(r'(?:\r\n|\r|\n){2}', '...')
But:
>>> re.split(r'(?:\r\n|\r|\n){2}', '\r\n\r\n\r\n')
['', '', '']
I'd like to get ['', '\r\n']
in this case. I probably need some sort of possessiveness or make it not backtrack. Is there a way?
Upvotes: 1
Views: 198
Reputation: 627110
You may restrict the \n
and \r
matching positions using lookarounds to avoid matching them when in a CRLF:
r'(?:\r\n|\r(?!\n)|(?<!\r)\n){2}'
Python test:
>>> import re
>>> re.split(r'(?:\r\n|\r(?!\n)|(?<!\r)\n){2}', '\r\n\r\n\r\n')
['', '\r\n']
See the regex graph:
Details
(?:\r\n|\r(?!\n)|(?<!\r)\n){2}
- a non-capturing group (if you a capturing one, the value captured with the last iteration will be output into the resulting list with re.split
, too) that matches two repetitions of:
\r\n
- a CRLF sequence|
- or \r(?!\n)
- CR symbol not followed with LF|
- or (?<!\r)\n
- LF symbol not preceded with CR.Upvotes: 1