How does code coloring work?

Question

How do code coloring engines work, exactly? Do they just generate a parse tree that preserves whitespace, color the leaves, and reconstruct the original program? How does live code coloring manage to be efficient enough to do it on the fly?

hmakholm left over Monica · Accepted Answer

Most syntax hightligters I know of do not react to the syntax tree, but just tokenize the source and color text according to which kinds of tokens it forms. The most difficult task such as highlighter has to do is recognizing multi-line comments (and/or strings, if the language allows that); everything else can be kept within a single source line.

Automatic indentation engines are more involved. In theory the best results would come from reconstructing a full syntax tree, but that is slow and raises problems of error handling (because most programs are not even well-formed while they're being edited). Instead they use various kinds of simplified scanning and heuristics, which doesn't always manage to match the true syntax of the language.

(edit: on further thought this is not completely true. For example, Eclipse's Java editor will also change the color of identifiers according to whether they name local variables, instance fields or static variables/methods. This happens in a separate pass from the basic lexical highlighting, after the editor has parsed and typechecked the code for live crossreferencing).

How does code coloring work?

Answers (2)

Related Questions