Reputation: 73
I'm writing a programming language compiler to integrate DSLs and C/C++. For that I have decided for LLVM for a couple of reasons.
There is a main program. In this main program I load bitcode files, which were compiled by clang. The loadable bitcode file represents a short, but complete programming language environment with REPL, parser, linker and AST.
My understanding so far was that boolean datatypes are represented in IR as i1. I have optimized my code with -O3 and I get for a boolean following IR code (by disassembling with llvm-dis from the generated bitcode file):
%"class.tl::contrib::toy::ToyREPL" = type <{ %"class.tl::contrib::toy::InitLanguage"*, i8, [7 x i8] }>
The class is ToyREPL and it is using another class InitLanguage. Oddly, the boolean seems to be presented by an i8 and an array of i8. I don't really get it.
I have defined a Makefile. First I compile the files. Afterwards I link them to a bc file, then optimize and link it with some other libs.
@cd $(BIN)/$(TARGET)/$(2); $(LINK) -o $(1).$(BITCODE_EXT) $(3)
@cd $(BIN)/$(TARGET)/$(2); $(OPT) -O3 $(1).$(BITCODE_EXT) -o $(1).$(OPT_NAME).$(BITCODE_EXT) $(OPTIMIZER_FLAGS)
@$(LINK) -o $(BIN)/$(TARGET)/$(2)/$(1).$(BITCODE_EXT) $(BIN)/$(TARGET)/$(2)/$(1).$(OPT_NAME).bc $(LINK_OPTION) $(4)
Compiler flags are:
-v -g -emit-llvm -I$(BOOST_INC_DIR) -std=c++11 -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS
Optimizer flags are -std-link-opts
Link flag is -v
.
The relevant part of the Class ToyREPL is here:
class ToyREPL {
private:
InitLanguage *initLang;
bool runs = false;
Now my question: Is my assumption wrong bool should be bitcode compiled to i1? What kind of compiler switch I need to consider to compile to i1? Let me know if you think my build process is wrong in some way. The generate bitcode file is readable and I can retrieve the module and the class ToyREPL as a StructType.
Upvotes: 4
Views: 7370
Reputation: 26858
If I understand you correctly, your question is essentially - why was the C++ class
class ToyREPL {
bool runs = false;
...
};
Compiled by Clang into type <{ i8, [7 x i8], ... }>
?
So first of all, why Clang chose i8
over i1
for a boolean field is straightforward - the smallest C++ type takes one byte of memory, and unless you use bit-fields, that also applies to fields in structs. Also see this related question about why a whole byte is used for booleans. LLVM itself uses i1
for boolean values, but that's because it's roughly platform-independent - in the lowering phase those might become whole bytes again.
As for [7 x i8]
, that's padding, made to ensure every object of this type is 64-bit aligned and does not share its memory with any other object - very reasonable approach on a 64-bit system. Alternatively, if there's a following struct field, the padding might have been inserted to ensure that field is 64-bit aligned.
The Wikipedia article on alignment and padding is a useful starting point if you want to know more.
Upvotes: 7