yoyo
yoyo

Reputation: 1133

How do C/C++ parsers work?

I've spent much time researching how PHP's parser works:

it translates PHP code to finally c code.

But how's c code translated to executables?

BTW, how to judge whether language A can be converted to language B somehow from mathematical aspect?

Upvotes: 3

Views: 5180

Answers (4)

Dhruv Gairola
Dhruv Gairola

Reputation: 9182

From my understanding, your main problem seems to be about the compilation process. As you mentioned in your comments, you're confused about a parser and compiler. Let me help you a bit:

alt text

Parsing is just one of the steps in the compilation process. To better understand your question, you must first understand how compilers work. Generally, the above few steps are usually employed by compilers. To understand the above will take a bit of work. If you want to delve further, read the lectures from this link. The source and target code depend on the context. Usually the source code is a high level language, and the target code is machine code.

Upvotes: 6

sarnold
sarnold

Reputation: 104020

PHP doesn't really 'translate' to C code; the PHP interpreter interprets PHP while running executable code, and the PHP interpreter is a state machine that knows how to execute all the PHP in the process. No intermediate C is needed or desired. Because it is interpreted, every time the PHP executable interprets the PHP program, it is re-evaluated.

The PHP interpreter is written in C, but it could have been in C++ or Assembly or Pascal or Erlang or bash or Java or whatever else you might wish. (I think it started out in Perl, but my memory is getting fuzzy.)

C is compiled with a compiler, which is run once before the program may be run thousands of times. Most C compilers make several 'passes': lexing input into tokens, parsing the tokens into a tree, then modifying the Abstract Syntax Tree to generate symbol tables for each of the scopes of execution. After the abstract syntax tree has been subjected to various optimizations such as dead code removal and static single assignment, the tree is passed to a code generator which will generate the required object file for the input that can run on the target architecture in question. The object file is linked with the linker to objects (functions and variables in C) not defined in that specific translation unit so that the program can be loaded by the linker/loader at run time.

The dragon book is the usual best source for learning about compilers, but I recommend Pragprog's Language Implementation Patterns instead.

Upvotes: 7

templatetypedef
templatetypedef

Reputation: 372704

This is a really great and really deep question that draws on a lot of parts of computer science.

Ultimately, all programs on a computer execute by issuing instructions to the processor in machine code. There is no one "machine code," and every processor has its own set of instructions that can be executed. These are usually low-level operations like "load a value into memory" or "add two values together." In theory, every program can be written in machine code, but this is rarely the case. Machine code is essentially series of zeros and ones that are decoded in particular ways by the processor, and it would be all but impossible to build any complex system directly this way.

One step above machine code is assembly language, a very low-level macro-esque language that usually has a one-to-one mapping with the machine code. For example, you might have commands like "add" that do addition, "sub" for subtraction, or "call" for function calls. Ultimately, the code is converted to machine code using an assembler, a program that translates the assembly to machine code. It's possible to build big complex systems in assembly, but it's very difficult.

Many programming languages like C and C++ are compiled, which means that a special program called the compiler translates the source code down into assembly language, which can then be converted directly to machine code. In this way, you can program code that works at a high level - it can have variables, functions, objects, templates, exceptions, etc. - but which can operate directly on machine hardware. Other programming languages are interpreted, which means that a special program called an interpreter parses the source code, builds up some in-memory representation of it, and then translates it to assembly either indirectly (by using the program to control what instructions to execute) or directly (by generating assembly on an as-needed basis).

The theory of how to convert from one language into another has been extensively studied. There are many challenges, ranging from "how do you even look at the source code of the program and understand what you're looking at?" to "what is the most efficient way to convert this program into some other language?" The former involves lexing, parsing, and semantic analysis; the latter involves optimization and code generation.

Typically, a program in any language can be converted into an equivalent program in another language, though there can be a notable loss in efficiency. Some programming languages have special functions that access underlying hardware and thus can't be written in languages that don't have access to that hardware, but this is rarely the case. One typical measure of whether a program can be rewritten in another language is to ask whether the two languages are Turing-complete, a mathematical term indicating whether the programming language is expressive enough to encode particular classes of functions.

Hope this helps!

Upvotes: 10

chrisaycock
chrisaycock

Reputation: 37930

how to judge whether language A can be converted to language B somehow from mathematical aspect?

If both languages are Turing complete, then one can be translated into another.

As for your PHP to C assumption, there are "source to source" compilers like HipHop, but this isn't the common case. Most dynamically typed languages are compiled to a byte code and run on a virtual machine.

As for C, the compiler translates it essentially to assembly language for the target processor.

If you want to know more, you can read-up about compiler design, abstract syntax trees, and language semantics. It's a lot to take-in at once if you're new to it though, so Stack Overflow really isn't the best place to get started with a topic this big.

Upvotes: 2

Related Questions