Reputation: 300
Whilst reading through a variety of articles about LLVM and its own documentation I've seen references to a property I find strange concerning the backwards compatibility of its IR.
Much of the documentation concerning the IR mentions that it is unstable and can break at pretty much any time. However, it also often mentioned that the bitcode IR is more backwards compatible (as in 'often valid across more versions') than the text IR for a given specific LLVM version.
My understanding is that the bytecode -> bitcode transformation is pretty much a direct mapping. Knowing this, why/how is it that the text IR is less compatible? I can't seem to find documentation on the actual mechanism that drives this behavior.
One example of such a statement about IR compatibility can be found here: http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility
Upvotes: 0
Views: 275
Reputation: 3717
I think the LLVM developers chose to promise more compatibility for bitcode because there are more obvious use cases that benefit from having backwards compatible bitcode than there are for the textual representation.
You may for example store and distribute libraries as LLVM bitcode and load them into some kind of just-in-time compiling interpreter for a user language. It could be inconvenient to have to upgrade the bitcode whenever the interpreter is upgraded. Bitcode is a machine interface, that is more useful with well defined compatibility.
Textual LLVM IR is more often used for debugging, documentation, or internal testing of LLVM, where changing any failing code whenever (infrequent) changes to the representation are made is more practical. It seems less likely that intermediate code is stored in the long term or distributed widely.
Since maintaining backwards compatibility can negatively impact both implementation effort (see e.g. a C++ parser) and the future evolution of a language (think of the struggles of modern programming languages to deal with retrospectively unwise decisions of the past), the LLVM community chose to only do so when there are clear benefits. I do not think there is a technical reason inherent in the formats, since both formats have to be parsed and handled correctly.
Upvotes: 1
Reputation: 8098
Speaking as a developer and not one that developed LLVM.
As you would imagine, the bitcode is more like a virtual machine that has a specific instruction set up to 64 bit that, until quantum computer instructions are popular, will rarely change. As new 64 bit chips are delivered with incremental enhancements it is not major surgery to 'append' new bitcode handling.
The IR, on the other hand, is a textual representation that will go through one or more passes to get to either bitcode or machine code. To support new capabilities there is obviously a net 'add' (new instructions) as well as modifications to existing. This will not only impact the IR parsers/emitters but potentially many intermediate/ephemeral data models used between them. Clearly this would also impact the C/C++ LLVM API. Very costly propositions to maintain backwards compatibility indeed.
Upvotes: 1