Tony
Tony

Reputation: 111

How to detect function pointers from assignment statements in LLVM IR?

I want to detect all occurences of function pointers, calls as well as assignments. I am able to detect indirect calls but how to do the assignment ones? Is there a char Iterator in the Instruction.h header or something which iterates character by character for each instruction in the .ll file? Any help or links will be appreciated.

Upvotes: 0

Views: 442

Answers (1)

Nick Lewycky
Nick Lewycky

Reputation: 1342

Start with unoptimized LLVM IR, then scan the use-list for all functions.

In the comments we established that this is for a school project, not a production system. The difference when building a production system is that you should rely only on the properties that are guaranteed, while for research or prototyping or schoolwork, you can build something that happens to work by relying on properties that are not guaranteed. That's what we're going to do here.

It so happens (but is not guaranteed!) that as clang converts C to LLVM IR, it will[1] emit a store for every function pointer used. As long as we don't run the LLVM optimizer, they will still be there.

The "forwards" direction would be to look into instructions and see whether any of them do the action you want (assign or call a function pointer). I've got a better idea: let's do it backwards. Every llvm::Value has a list of all places where that value is used, called the use-list. For example %X = add i32 %Y, %Z, %X would appear in the use-list of %Y because %X is using %Y.

Starting with your Module, iterate over every function (ie. for (auto &F : M->functions()) {), then scan the use-list for the function (ie. for (const auto *U : F.users())) and look at those Values (ie. U.get()). If the user value is an Instruction, you can query which Function this Instruction belongs to -- the Instruction has a parent BasicBlock and the BasicBlock has a parent Function. If it's a ConstantExpr, you'll need to recurse through that ConstantExpr's use-list until you find all the instructions using those constants. If the instruction is a direct call, you want to skip it (unless it's a direct call with an argument that is also a function pointer, like somefun(1, &somefun, 2);).

This will miss any code that uses C typed function pointers but never point to any functions, for instance C code like void (*fun_ptr)(int) = NULL;. If this is important to you then I suggest writing a Clang AST tool with the RecursiveASTVisitor instead of using LLVM.

[1] clang is free not to, for instance given if (0) { /* ... */ } clang could skip emitting LLVM IR for the disabled code because it's faster to do that. If your function pointer use was in there, LLVM would never get the opportunity to see it.

Upvotes: 1

Related Questions