Khaled Alshaya
Khaled Alshaya

Reputation: 96849

Where does the compiler spend most of its time during parsing?

I read in Sebesta book, that the compiler spends most of its time in lexing source code. So, optimizing the lexer is a necessity, unlike the syntax analyzer.

If this is true, why lexical analysis stage takes so much time compared to syntax analysis in general ?

I mean by syntax analysis the the derivation process.

Upvotes: 3

Views: 968

Answers (4)

Matt
Matt

Reputation:

It certainly used to be the case that lexing was expensive. Part of that had to do with limited memory and doing multiple file operations to read in bits of program. Now that memory is measured in GB this is no longer an issue and for the same reason a lot more work can be done, so optimization is more important. Of course, whether the optimization helps much is another question.

Upvotes: 0

Imagist
Imagist

Reputation: 18514

It depends really where you draw the line between lexing and parsing. I tend to have a very limited view of what a token is, and as a result my parsers spend a lot more time on parsing than on lexing, not because they are faster, but because they simply do less.

Upvotes: 0

DavidMWilliams
DavidMWilliams

Reputation: 854

Lexical analysis is the process whereby all the characters in the source code are converted to tokens. For instance

foreach (x in o)

is read character by character - "f", "o", etc.

The lexical analyser must determine the keywords being seen ("foreach", not "for" and so on.)

By the time syntactic analysis occurs the program code is "just" a series of tokens. That said, I agree with the answer above that lexical analysis is not necessarily the most time-consuming process, just that it has the biggest stream to work with.

Upvotes: 1

Martin v. Löwis
Martin v. Löwis

Reputation: 127447

First, I don't think it actually is true: in many compilers, most time is not spend in lexing source code. For example, in C++ compilers (e.g. g++), most time is spend in semantic analysis, in particular in overload resolution (trying to find out what implicit template instantiations to perform). Also, in C and C++, most time is often spend in optimization (creating graph representations of individual functions or the whole translation unit, and then running long algorithms on these graphs).

When comparing lexical and syntactical analysis, it may indeed be the case that lexical analysis is more expensive. This is because both use state machines, i.e. there is a fixed number of actions per element, but the number of elements is much larger in lexical analysis (characters) than in syntactical analysis (tokens).

Upvotes: 8

Related Questions