Reputation: 5697
proof of point http://adams-site.x10.mx/v/python.png
You'll notice in this image the two print statements are different colours.
It doesn't really matter a great deal, I'm not really bothered, but I thought it would be nice to know why, or if this is just a bug.
(I have seen this link, but I really would like to know why.)
Upvotes: 1
Views: 3634
Reputation: 17920
This is the code responsible for syntax highlighting from ColorDelegator.py:
def any(name, alternates):
"Return a named group pattern matching list of alternates."
return "(?P<%s>" % name + "|".join(alternates) + ")"
def make_pat():
kw = r"\b" + any("KEYWORD", keyword.kwlist) + r"\b"
builtinlist = [str(name) for name in dir(__builtin__)
if not name.startswith('_')]
# self.file = file("file") :
# 1st 'file' colorized normal, 2nd as builtin, 3rd as string
builtin = r"([^.'\"\\#]\b|^)" + any("BUILTIN", builtinlist) + r"\b"
comment = any("COMMENT", [r"#[^\n]*"])
sqstring = r"(\b[rRuU])?'[^'\\\n]*(\\.[^'\\\n]*)*'?"
dqstring = r'(\b[rRuU])?"[^"\\\n]*(\\.[^"\\\n]*)*"?'
sq3string = r"(\b[rRuU])?'''[^'\\]*((\\.|'(?!''))[^'\\]*)*(''')?"
dq3string = r'(\b[rRuU])?"""[^"\\]*((\\.|"(?!""))[^"\\]*)*(""")?'
string = any("STRING", [sq3string, dq3string, sqstring, dqstring])
return kw + "|" + builtin + "|" + comment + "|" + string +\
"|" + any("SYNC", [r"\n"])
It builds up a large regular expression which it uses to match items to colour. In particular, the regex defined as kw
will match a keyword (as defined by the keyword module) anywhere it's found in the source file, while the regex defined as builtin
will match a builtin (as discovered by scanning __builtin__
) as long as it doesn't follow a period, quote, double-quote, backslash or hash symbol.
Now, there are a combination of factors at work to give the strange behaviour you see. First of all, in Python 2.7 print
is both a keyword and a builtin. (I'm not sure why, but I imagine it might be to keep closer to Python 3.0 where print
is obviously a builtin and not a keyword.) So a regex is constructed that can match print as either a keyword or a builtin. But why does it sometimes match as one and sometimes as the other?
The difference is due to the construction of the regex. At the start of a line, the kw
regex matches from the first character and it matches before the rest can be considered. However, after the start of the line, the builtin
regex actually matches a character earlier, because the first character it looks for is "any character that isn't a period, quote, double-quote, backslash or hash". Even though that character isn't included in the labelled group, it's still part of the match. So when print
is preceded by a space or tab, the builtin
regex matches first.
One way to fix this would be to use a negative lookbehind assertion, but such a complicated regular expression already makes me a bit nervous and I'm never sure which regex features can result in catastrophic performance degradation. A simpler fix is to filter out any builtins that are also keywords before constructing the regex, and that's exactly what has been done in Python 3.2.2, as described in the bug report linked to from the question you reference.
Upvotes: 5
Reputation: 35079
According to the bug report linked in the previous question you mention, IDLE was getting confused with True
, False
, and None
which weren't keywords for a long while, but become keywords by Py3.0 - previous to becoming keywords, they were just names in the built-in global namespace. So, IDLE would, between different contexts, colour them as builtins or as keywords inconsistently.
print
has undergone exactly the opposite transformation - it was a keyword up until 3.0, after which it is merely a builtin (since it is now a function, rather than a statement). So, IDLE colours it both ways, depending on how it resolves which one it thinks applies. This appears to be resolved by the same patch (which is only in 3.2 and on, not any 2.x branch) - print
is coloured purple consistently.
Upvotes: 5