Steve F
Steve F

Reputation: 153

How can I disassemble the result of LLVM MCJIT compilation?

I have a program I wrote which uses LLVM 3.5 as a JIT compiler, which I'm trying to update to use MCJIT in LLVM 3.7. I have it mostly working, but I'm struggling to reproduce one debug-only feature I implemented with LLVM 3.5.

I would like to be able to see the host machine code (e.g. x86, x64 or ARM, not LLVM IR) generated by the JIT process; in debug builds I log this out as my program is running. With LLVM 3.5 I was able to do this by invoking ExecutionEngine::runJITOnFunction() to fill in a llvm::MachineCodeInfo object, which gave me the start address and size of the generated code. I could then disassemble that code.

I can't seem to find any equivalent in MCJIT. I can get the start address of the function (e.g. via getPointerToFunction()) but not the size.

I have seen Disassemble Memory but apart from not having that much detail in the answers, it seems to be more about how to disassemble a sequence of bytes. I know how to do that, my question is: how can I get hold of the sequence of bytes in the first place?

If it would help to make this more concrete, please reinterpret this question as: "How can I extend the example Kaleidoscope JIT to show the machine code (x86, ARM, etc) it produces, not just the LLVM IR?"

Thanks.

Upvotes: 5

Views: 1213

Answers (2)

Dr. Koutheir Attouchi
Dr. Koutheir Attouchi

Reputation: 1642

Include the following header llvm/Object/SymbolSize.h, and use the function llvm::object::computeSymbolSizes(ObjectFile&). You will need to get an instance of the ObjectFile somehow.

To get that instance, here is what you could do:

  1. Declare a class that is called to convert a Module to an ObjectFile, something like: class ModuleToObjectFileCompiler { ... // Compile a Module to an ObjectFile. llvm::object::OwningBinary<llvm::object::ObjectFile> operator() (llvm::Module&); };
  2. To implement the operator() of ModuleToObjectFileCompiler, take a look at llvm/ExecutionEngine/Orc/CompileUtils.h where the class SimpleCompiler is defined.

  3. Provide an instance of ModuleToObjectFileCompiler to an instance of llvm::orc::IRCompileLayer, for instance: new llvm::orc::IRCompileLayer <llvm::orc::ObjectLinkingLayer <llvm::orc::DoNothingOnNotifyLoaded> > (_object_layer, _module_to_object_file);

  4. The operator() of ModuleToObjectFileCompiler receives the instance of ObjectFile which you can provide to computeSymbolSizes(). Then check the returned std::vector to find out the sizes in bytes of all symbols defined in that Module. Save the information for the symbols you are interested in. And that's all.

Upvotes: 0

Vladislav Ivanishin
Vladislav Ivanishin

Reputation: 2152

You have at least two options here.

  1. Supply your own memory manager. This must be well documented and is done in many projects using MCJIT. But for the sake of completeness here's the code:

    class MCJITMemoryManager : public llvm::RTDyldMemoryManager {
    public:
    static std::unique_ptr<MCJITMemoryManager> Create();
    
    MCJITMemoryManager();
    virtual ~MCJITMemoryManager();
    
    // Allocate a memory block of (at least) the given size suitable for
    // executable code. The section_id is a unique identifier assigned by the
    // MCJIT engine, and optionally recorded by the memory manager to access a
    // loaded section.
    byte* allocateCodeSection(uintptr_t size, unsigned alignment,
                              unsigned section_id,
                              llvm::StringRef section_name) override;
    
    // Allocate a memory block of (at least) the given size suitable for data.
    // The SectionID is a unique identifier assigned by the JIT engine, and
    // optionally recorded by the memory manager to access a loaded section.
    byte* allocateDataSection(uintptr_t size, unsigned alignment,
                        unsigned section_id, llvm::StringRef section_name,
                        bool is_readonly) override;
    ...
    }
    

    Pass a memory manager instance to EngineBuilder:

    std::unique_ptr<MCJITMemoryManager> manager = MCJITMemoryManager::Create();
    llvm::ExecutionEngine* raw = lvm::EngineBuilder(std::move(module))
        .setMCJITMemoryManager(std::move(manager))
        ...
        .create();
    

    Now via these callbacks you have control over the memory where the code gets emitted. (And the size is passed directly to your method). Simply remember the address of the buffer you allocated for code section and, stop the program in gdb and disassemble the memory (or dump it somewhere or even use LLVM's disassembler).

  2. Just use llc on your LLVM IR with appropriate options (optimization level, etc.). As I see it, MCJIT is called so for a reason and that reason is that it reuses the existing code generation modules (same as llc).

Upvotes: 1

Related Questions