Reputation: 922
A regular expression with an end anchor ($) completely ignores the presence of a trailing newline when matching.
Ex.
import re
regex = re.compile(r'^$')
text = "\n"
print regex.match(text)
The above code snippet will match the text containing "\n". Since the regular expression above has nothing between the start and end anchors, I assume it should only match the null string.
Is there any way to work around this behavior?
P.S. The above code is a simplified regular expression to illustrate the problem. The actual regular expression that I'm using is:
re.compile(r'^\S(?:\S| (?!\s)){0,199}$(?<=\S)')
Which also matches text containing trailing newlines.
Upvotes: 2
Views: 322
Reputation: 52029
Use \Z
to match the end of the buffer and \A
to match the beginning of the buffer.
Update: The reason why ^$
doesn't do what you want is because the rules for matching $
are:
$
matches just before the final newline$
matches the end of the bufferIf the regex is compiled with re.MULTLINE
then $
will also match just before any internal newline.
Here is some code which demonstrates this:
import re
def showit(r, inp):
ms = r.finditer(inp)
for i,m in enumerate(ms):
print " match", i, " start:", m.start(0), " end:", m.end(0)
print ""
print "regex x$ against x\\nx"
showit(re.compile("x$"), "x\nx")
print "regex x$ against x\\nx\\n"
showit(re.compile("x$"), "x\nx\n")
print "regex x$ re.MULTILINE against x\\nx"
showit(re.compile("x$", re.MULTILINE), "x\nx")
Output:
regex x$ against x\nx
match 0 start: 2 end: 3
regex x$ against x\nx\n
match 0 start: 2 end: 3
regex x$ re.MULTILINE against x\nx
match 0 start: 0 end: 1
match 1 start: 2 end: 3
Upvotes: 7