Reputation: 1
I am trying to design a data structure to hold abstract, assembly-like instructions. This is intended to be the intermediate representation for my compiler project.
For example: loadAI r1, 4 => r2 //is an instruction to load a value from the address computed from the base address stored in virtual register 1 (r1) with the offset 4 into virtual register 2 (r2) From a high level perspective, I want the instruction to be stored as the triple (loadAI, [r1, 4], [r2]). The general form is (opcode, [source operands list], [result operands list])
My issue is the actual implementation of such a data structure (in C++). The issues arises from the fact that the operands can be of different types.
In theory, I could just store each operand as a string and represent the IR that way. For example, a register r1 could be stored as "r1". Or if the operand is a constant such as 524288, it could be stored as "524288". This means the compiler has to read, compare, and manipulate string values when dealing with the IR -- this seems inefficient, especially when there is an obvious way to represent atleast some types of operands as numbers.
... so I thought through how things could work if I used numbers to represent operands. For example, register r1 could be represented with the integer value 1. However, there are other types of operands besides registers. For example, literal values. What if my compiler encounters long int = -32? The high level IR instruction might be loadI -32 => r2. Then that means my data structure needs to be able to store / handle operands of different types and meanings. Some operands might be registers. Some might be literals of a wide of types (char, short, int, long, float, double, signed vs. unsigned). How can I implement an Operand class where the underlying operand may have different representations?
After researching some C++ features (I'm a new to C++), it seemed like a union might be a good option to resolve the issue of an operand have a variety of possible types. In addition to to this, I would use an enum that enumerates all of the possible type of operands:
union OperandSymbol {
int regID;
int signed_int_const;
float float_const;
//... alot more types
}
enum {regID, signed_int_const, float_const, ... } //large enum
OperandSymbol opSym;
int operandType; //holds a value from the above enum
Now let's say the compiler code wants to examine an instruction's operand: We would first query that operand's operandType. We would then need to hardcode a switch statement with one case for each possible value of operandType (i.e. the size of enum which may be around 12). Based on which branch is taken, we access the corresponding member of the union. This is done everytime we access an operand of some instruction.
My final questions: Is this a reasonable design? Should I just represent operands with strings? If no, are there c++ features that solve this problem?
I am new to compilers, so I realize my lack of understanding may be the real source of my issue.
Upvotes: 0
Views: 66
Reputation: 67812
The thing you're describing is a discriminated union.
It's widely-used enough that there is standard library support for this idiom, in std::variant
.
So yes, in general it's a reasonable approach. Using a variant allows you to visit the contained type instead of writing a switch/case - this may be more expressive and/or convenient, but there's no way around handling every combination you allow.
If you can simplify the problem (eg, reduce the number of different types stored), that will also simplify the solution. Perhaps you don't need distinct types for every combination of signed and unsigned immediate integral value, but just store a uint64_t and interpret it according to the opcode.
Upvotes: 1