Difficulty understanding the asymmetry in compatibility between LLVM text IR vs binary representation across versions

Question

Whilst reading through a variety of articles about LLVM and its own documentation I've seen references to a property I find strange concerning the backwards compatibility of its IR.

Much of the documentation concerning the IR mentions that it is unstable and can break at pretty much any time. However, it also often mentioned that the bitcode IR is more backwards compatible (as in 'often valid across more versions') than the text IR for a given specific LLVM version.

My understanding is that the bytecode -> bitcode transformation is pretty much a direct mapping. Knowing this, why/how is it that the text IR is less compatible? I can't seem to find documentation on the actual mechanism that drives this behavior.

One example of such a statement about IR compatibility can be found here: http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility

Frank C. · Accepted Answer

Speaking as a developer and not one that developed LLVM.

As you would imagine, the bitcode is more like a virtual machine that has a specific instruction set up to 64 bit that, until quantum computer instructions are popular, will rarely change. As new 64 bit chips are delivered with incremental enhancements it is not major surgery to 'append' new bitcode handling.

The IR, on the other hand, is a textual representation that will go through one or more passes to get to either bitcode or machine code. To support new capabilities there is obviously a net 'add' (new instructions) as well as modifications to existing. This will not only impact the IR parsers/emitters but potentially many intermediate/ephemeral data models used between them. Clearly this would also impact the C/C++ LLVM API. Very costly propositions to maintain backwards compatibility indeed.

Difficulty understanding the asymmetry in compatibility between LLVM text IR vs binary representation across versions

Answers (2)

Related Questions