jmasterx
jmasterx

Reputation: 54103

How do c/c++ compilers know which line an error is on

There is probably a very obvious answer to this, but I was wondering how the compiler knows which line of code my error is on. In some cases it even knows the column.

The only way I can think to do this is to tokenize the input string into a 2D array. This would store [lines][tokens].

C/C++ could be tokenized into 1 long 1D array which would probably be more efficient. I am wondering what the usual parsing method would be that would keep line information.

Upvotes: 7

Views: 329

Answers (2)

Alexander Oh
Alexander Oh

Reputation: 25621

actually most of it is covered in the dragon book. Compilers do Lexing/Parsing i.e.: transforming the source code into a tree representation. When doing so each keyword variable etc. is associated with a line and column number.

However during parsing the exact origin of the failure might get lost and the information might be off.

Upvotes: 6

Marco A.
Marco A.

Reputation: 43662

This is the first step in the long, complicated path towards "Engineering a Compiler" or Compilers Theory

The short answer to that is: there's a module called "front-end" that usually takes care of many phases:

  1. Scanning
  2. Parsing
  3. IR generator
  4. IR optimizer ...

The structure isn't fixed so each compiler will have its own set of modules but more or less the steps involved in the front-end processing are

Scanning - maps character streams into words (also ignores whitespaces/comments) or tokens

Parsing - this is where syntax and (some) semantic analysis take place and where syntax errors are reported

To make this up to you: the compiler knows the location of your error because when something doesn't fit into a structure called "abstract syntax tree" (i.e. it cannot be constructed) or doesn't follow any of the syntax-directed translation rules, well.. there's something wrong and the compiler indicates the location where this didn't happen. If there's a grammar error on just one word/token then even a precise column location can be returned since nothing matched a terminal keyword: a basic token like the if keyword in the C/C++ language.

If you want to know more about this topic my suggestion is to start with the classic academic approach of the "Compiler Book" or "Dragon Book" and then, later on, possibly study an open-source front-end like Clang

Upvotes: 3

Related Questions