Reputation: 9811
I see that more and more people are switching to LLVM, especially people with a background in C or C++, so there is a pattern in which kind of people are approaching this compiler, what surprises me is the highly heterogeneous set of technologies that LLVM can manage, and I don't get what is pipeline that this virtual machine follows and what are the resulting benefits.
I would like to stress the fact that I'm focusing on LLVM, not really on clang.
A 1 in a million example is this one ( Youtube Video ), where the pipeline is not really obvious for me, or this other one, but apparently there a lot of totally different solutions where, for example, LLVM is used in conjunction with a JIT solution.
In short I see different syntax and semantics, people using LLVM to produce GPU shaders or binary objects, but I can't see the common denominator.
What is the meaning of "LLVM based compilation", Considering LLVM as a black box, what is the kind of input, output and the business logic in the middle ?
Upvotes: 2
Views: 537
Reputation: 414
LLVM's advantage over other source-available compilers is that it is designed as a set of reusable libraries. That means to some degree you can pick-and-choose what to include in your tool. Not every language tool needs optimization and not every language tool needs code generation. LLVM is a very flexible system for langauge processing.
Generally when people say, "LLVM based compilation," they mean using one or more of the LLVM libraries to implement their tool. They can leverage all of the work put into LLVM in understanding its IR and generating code for multiple targets.
The LLVM IR is the common representation used by most of the LLVM libraries. It is the interface you need to write to. For low-level stuff like machine code you will need to deal with some of the other LLVM representations (MachineInstr, MC, etc.).
As for writing a frontend to generate that LLVM IR, the tricky part is ensuring that the translation from your source language to the LLVM IR preserves the semantics of the source language. The LLVM IR has a well-defined but low-level set of semantics for each instruction. If your source language has higher-level semantics you will have to lower them into LLVM IR instruction sequences to implement it. For example, there is no LLVM instruction that handles C-style bitfield access so C language frontends must use a sequence of LLVM instructions to implement the functionality (generally shifts and bitwise operations).
As long as you implement the semantics of your source language in the LLVM IR correctly, the LLVM libraries will have no problem performing correct code transformations. If some desired transformation requires higher-level semantics information than LLVM IR can provide, you either have to do the transformation in some stage before converting to LLVM IR (and so you will have the high-level information available) or you can pass attribute information in the LLVM IR to convey the high-level semantics and write a custom LLVM pass to implement the transformation. It is usually far cleaner to do the former than the latter.
Upvotes: 2
Reputation: 26898
I can't see the common denominator.
The common denominator is converting code in one language to code in another language. And that's exactly what compilers do. So if you want to convert a piece of code in a "source language" to one in a "target language", what you need to do is:
That's it. Now you get to enjoy things like the optimizations LLVM performs on the code between its input and output, its common framework for "lowering" the code to something closer to machine code, etc.
GCC behaves the same, by the way, it's just that LLVM is considered by many to be superior in some aspects, particularly licensing and ease of modification.
Upvotes: 4