repl
repl

Reputation: 73

Data type for bool in LLVM IR

I'm writing a programming language compiler to integrate DSLs and C/C++. For that I have decided for LLVM for a couple of reasons.

There is a main program. In this main program I load bitcode files, which were compiled by clang. The loadable bitcode file represents a short, but complete programming language environment with REPL, parser, linker and AST.

My understanding so far was that boolean datatypes are represented in IR as i1. I have optimized my code with -O3 and I get for a boolean following IR code (by disassembling with llvm-dis from the generated bitcode file):

%"class.tl::contrib::toy::ToyREPL" = type <{  %"class.tl::contrib::toy::InitLanguage"*, i8, [7 x i8] }>

The class is ToyREPL and it is using another class InitLanguage. Oddly, the boolean seems to be presented by an i8 and an array of i8. I don't really get it.

I have defined a Makefile. First I compile the files. Afterwards I link them to a bc file, then optimize and link it with some other libs.

@cd $(BIN)/$(TARGET)/$(2); $(LINK) -o $(1).$(BITCODE_EXT) $(3)

@cd $(BIN)/$(TARGET)/$(2); $(OPT) -O3 $(1).$(BITCODE_EXT) -o $(1).$(OPT_NAME).$(BITCODE_EXT) $(OPTIMIZER_FLAGS) 

@$(LINK) -o $(BIN)/$(TARGET)/$(2)/$(1).$(BITCODE_EXT) $(BIN)/$(TARGET)/$(2)/$(1).$(OPT_NAME).bc $(LINK_OPTION) $(4)

Compiler flags are:

-v -g -emit-llvm -I$(BOOST_INC_DIR) -std=c++11 -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS

Optimizer flags are -std-link-opts

Link flag is -v.

The relevant part of the Class ToyREPL is here:

class ToyREPL {
private:

  InitLanguage *initLang;

  bool runs = false;

Now my question: Is my assumption wrong bool should be bitcode compiled to i1? What kind of compiler switch I need to consider to compile to i1? Let me know if you think my build process is wrong in some way. The generate bitcode file is readable and I can retrieve the module and the class ToyREPL as a StructType.

Upvotes: 4

Views: 7370

Answers (1)

Oak
Oak

Reputation: 26858

If I understand you correctly, your question is essentially - why was the C++ class

class ToyREPL {
  bool runs = false;
  ...
};

Compiled by Clang into type <{ i8, [7 x i8], ... }>?

So first of all, why Clang chose i8 over i1 for a boolean field is straightforward - the smallest C++ type takes one byte of memory, and unless you use bit-fields, that also applies to fields in structs. Also see this related question about why a whole byte is used for booleans. LLVM itself uses i1 for boolean values, but that's because it's roughly platform-independent - in the lowering phase those might become whole bytes again.

As for [7 x i8], that's padding, made to ensure every object of this type is 64-bit aligned and does not share its memory with any other object - very reasonable approach on a 64-bit system. Alternatively, if there's a following struct field, the padding might have been inserted to ensure that field is 64-bit aligned.

The Wikipedia article on alignment and padding is a useful starting point if you want to know more.

Upvotes: 7

Related Questions