user249375
user249375

Reputation:

Compiler - front end back end

I understand the structure of a compiler in regards to front-end and back-end. However, I am not sure why compilers are often divided into front-end and back-end. I am sure there are many reasons can you give me a few? because, most books / websites tell you what they are but fail to tell you why!

Thank you.

Upvotes: 48

Views: 32034

Answers (3)

user207421
user207421

Reputation: 311054

The front-end deals with the language itself: scanning, parsing, the parse-tree. The back end deals with the target system: object code formats, the machine code itself, ... The two things don't have all that much to do with each other, and for a portable compiler it is highly desirable to use the same front-end with multiple backends, one per target.

You can take that further, as gcc does, and have a front/backend interface that is language-independent, so you can use different language front-ends with the same backend. In the old days this was called the MxN problem: you don't want to have to write MxN compilers where you have M languages and N target systems. The idea is to only have to write M+N compilers.

The MxN problem means MxN compilers for M languages and N target machines.

Upvotes: 89

Andrew Cooper
Andrew Cooper

Reputation: 32596

If you're talking about the front-end being the parser which tokenises the source code, and back-end being the bit which generates executable code based on the tokenised code, then one very good reason is this: portability.

Separating the parser from the executable code generation makes it much easier to port a compiler from one processor architecture to another.

Upvotes: 9

old_timer
old_timer

Reputation: 71606

Because you want to use some sort of internal pseudo code or tables/data structures. For example if you have some line of code:

a = b + c;

You would want to take that and break it into an intermediate language or IR (Intermediate representation):

load b
load c
add b + c
store a

as an example -- there are many solutions. The intermediate language is better than going straight to assembly for a particular target for a number of reasons:

  • By abstracting the hardware and providing a "logical number" of registers we are independent of the final "physical number" of registers and hardware layout. For example, the native ADD instruction may be stack based, take 1-operand, take 2-operands, or even 3 operands. At this higher level we don't need to know, or care, about the lower level hardware implementation.
  • The internal language can be optimized if you have an optimizer, and
  • Is generic enough to be used on multiple targets if you have a wish to target different processors.

I dont know enough about it but I think you also have the common used parsers bison/flex, boil you down into some sort of intermediate code/instruction set and then you write a backend for that.

You also benefit that you can for example have a C and C++ and other language front end, without affecting the backend.

You also benefit from breaking the compiler into logical modules blocks, you can develop and test the front end independently from the back end. llvm for example, allows for the export and import of the intermediate language, you could if you really really wanted to write code using the intermediate language and have the benefit of multiple targets on the backend.

Upvotes: 5

Related Questions