soham
soham

Reputation: 1676

What is getOpcode in LLVM?

What does the function getOpcode() return in the MCInstrDesc or MachineInstr class in the LLVM code generator part? I am not able to relate to the actual opcode of the machines.

For example, the getOpcode() function returns 2515 for RET instruction in x86. However, the real opcode in x86 is C3 (195 in decimal).

What is the relation?

Upvotes: 2

Views: 3622

Answers (2)

Celuk
Celuk

Reputation: 897

I was struggling about same question recently and finally got it. It's been a long time since your question, but I'm writing this for if someone has still confusion.

As you said getOpcode() function returns enum of machine instructions and these are not holding at {BackendArchName}InstrInfo.td as @dtolnay said. These enums will be generated after LLVM library built and it is not related to real decimal opcode numbers, and also opcode's decimal enum representations can be changing thanks to different llvm versions or custom changes in library about the backend. That sounds reasonable because it does not prevent customization, you can add or remove opcodes in a backend and after library built, enums of opcodes dynamically being changed and generated. If these enums were persistent in somewhere in the library before built, e.g probably you couldn't add new opcodes for a target or it would probably be hard to change small details like that.

So, after built you can find files with these opcode representations at this folder:

{your-llvm-directory}/{your-llvm-build-directory}/lib/Target/{which-backend-target}

and this .inc file includes enums for opcodes:

{which-backend-target}GenInstrInfo.inc

For example after built for riscv target in my pc I can find enums here:

~/llvm/llvm-project/build/lib/Target/RISCV/RISCVGenInstrInfo.inc

A part of these enums:

/*===- TableGen'erated file -------------------------------------*- C++ -*-===*\
|*                                                                            *|
|* Target Instruction Enum Values and Descriptors                             *|
|*                                                                            *|
|* Automatically generated file, do not edit!                                 *|
|*                                                                            *|
\*===----------------------------------------------------------------------===*/

#ifdef GET_INSTRINFO_ENUM
#undef GET_INSTRINFO_ENUM
namespace llvm {

namespace RISCV {
  enum {

// ........................................

    AND = 323,
    ANDI    = 324,
    ANDN    = 325,
    AUIPC   = 326,
    BDEP    = 327,
    BDEPW   = 328,
    BEQ = 329,
    BEXT    = 330,
    BEXTW   = 331,
    BFP = 332,
    BFPW    = 333,
    BGE = 334,
    BGEU    = 335,
    BLT = 336,
    BLTU    = 337,
    BMATFLIP    = 338,
    BMATOR  = 339,
    BMATXOR = 340,
    BNE = 341,
    CLMUL   = 342,
    CLMULH  = 343,
    CLMULHW = 344,
    CLMULR  = 345,
    CLMULRW = 346,
    CLMULW  = 347,
    CLZ = 348,
    CLZW    = 349,
    CMIX    = 350,
    CMOV    = 351,
    CRC32B  = 352,

// ........................................

    CRC32H  = 358,
    CRC32W  = 359,
    CSRRC   = 360,
    CSRRCI  = 361,
    CSRRS   = 362,
    CSRRSI  = 363,
    CSRRW   = 364,
    CSRRWI  = 365,
    CTZ = 366,
    CTZW    = 367,
    C_ADD   = 368,
    C_ADDI  = 369,
    C_ADDI16SP  = 370,
    C_ADDI4SPN  = 371,
    C_ADDIW = 372,
    C_ADDI_HINT_IMM_ZERO    = 373,
    C_ADDI_HINT_X0  = 374,
    C_ADDI_NOP  = 375,
// ........................................

  };

} // end namespace RISCV
} // end namespace llvm

I just gave an example for RISCV it is same process for other targets like X86. In your example 2515 for RET should be in X86GenInstrInfo.inc file as enum number like above.

Because these enums generated after built, it is normal to not find these in any unbuilt llvm library like in github.

EXTRA INFORMATION:

Because enum numbers can change for different scenarios you should not use them as number when you are using LLVM C++ API, you should call them with enum names. For example:

#include "RISCV.h" //You should include backend header for using enums.
#include "llvm/MC/MCInst.h"

//Some code here ... 

if(MI->getOpcode() == RISCV::ADD){  // MI is machine instruction for example const MCInst *MI

//Some code here...

}

//Some code here...

Above code, as you can see no matter what number returns getOpcode function because I can control it with name which derived from (RISCV header->RISCV namespace->RISCV Opcode Enum Name). However, if you want to get directly the name of returned enum number you can use getOpcodeName function for example:

#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstPrinter.h"
#include "llvm/ADT/StringRef.h" //for llvm string variable

//Some code here ... 

StringRef opcodeName = IP.getOpcodeName(MI->getOpcode()); // IP is instruction printer for example MCInstPrinter &IP

//Some code here ... 

Upvotes: 0

dtolnay
dtolnay

Reputation: 10993

The getOpcode() member function on MCInstrDesc and MachineInstr returns the enum value that identifies which opcode in X86InstrInfo.td the instruction represents. Within other backends the numbering corresponds to that backend's instr info, typically a file called [BACKEND]InstrInfo.td.

You can find an example of this being used in many of the X86 backend passes, for example the following code in X86ExpandPseudo.cpp that deals with tail call returns.

switch (MI.getOpcode()) {
default:
  return false;
case X86::TCRETURNdi:
case X86::TCRETURNdicc:
case X86::TCRETURNri:
case X86::TCRETURNmi:
case X86::TCRETURNdi64:
case X86::TCRETURNdi64cc:
case X86::TCRETURNri64:
case X86::TCRETURNmi64: {
  /* ... */
}
/* ... */
}

Upvotes: 4

Related Questions