joesmoe891
joesmoe891

Reputation: 253

(clang) How to parse macros themselves, getting an ast where possible?

Hi I'm using clang to extract information from c files. And I'm trying to extract the values of macros.

e.g. from this I'd want the value '13' or an ast (+ (* 3 4) 1):

#define SOME_CONSTANT 3*4+1

or from a macro function, I'd want an ast e.g. (SOME_MACROFUNC (x y) (+ (add4 x) (* y 9))) :

int add4(int q) {return q+4;}
#define SOME_MACROFUNC(x,y) add4(x)+y*9

So far I've managed to iterate through all the macros via the 'Preprocessor' class's macro_begin() and macro_end() functions.

Then from that I've gotten the macro names, and from the 'MacroInfo' class I've been able to get whether the macro is functionlike (including param names) or not. I've also got access to the tokens in the macro, but I am only able to get the token kind e.g: string_literal, identifier, comma, l_paren, r_paren, etc.

So two things:

  1. How do I access the actual value of the tokens, rather than just their kinds.

  2. Is there a way to generate an ast from the macros given their tokens? One way I thought would be to parse my source code, then extract the macros, and using their names, add code including those macros to my source and reparse it to get the ast.

e.g. Something like:

char *tempSOME_CONSTANT = SOME_CONSTANT;    
void tempSOME_MACROFUNC(char *x, char *y) {SOME_MACROFUNC(x,y);}

Though this method seems really hacky, and probably would have trouble with macros that aren't statement or expression like.

Thanks.

edit: To clarify I mainly want the expanded body (until no macros are left, only non macro tokens) of each macro.

edit2 Solved some what:

If anyones interested I intend to expand the body of the macro manually.

"preprocessor.getSpelling(token)" to get the token value.

"preprocessor.getIdentifierTable().get(StringRef(spelling))" to get identinfo for the token.

And using "clang\lib\Lex\PPMacroExpansion.cpp" as a reference.

Still thinking about how to pass it to the parser without reparsing the whole source tree, but that shouldn't be too difficult to figure out.

Thanks to Ira Baxter for the discussion, it helped me iron out the problem.

Upvotes: 14

Views: 4827

Answers (2)

David Airapetyan
David Airapetyan

Reputation: 5620

One hack that may or may not be useful is to temporarily redefine all function-like macros as functions. In your case:

#define SOME_MACROFUNC(x,y) add4(x)+y*9

will become

extern int SOME_MACROFUNC(int x, int y);

Then you can run this through a regular AST Matcher or visitor and do whatever processing you need (including refactoring, if appropriate). This doesn't scale well, but if the number of macros is well-known and limited, it could be an acceptable solution.

To make this cleaner, you can create a separate header file that is only used by clang and make sure clang finds it first when running your analysis, this way you don't need to modify your original source base.

Upvotes: 0

A. K.
A. K.

Reputation: 38216

I am working on something very similar. I use clang front end for collecting the context (w.r.t. class, function etc.) in which a macro is defined and then use a (pseudo) expression parser to figure out if the macro-body is a valid expression or not. The ultimate goal is to transform the macro into C++ declaration. Recently we got a paper accepted into ICSM-2012 that explains how we achieve this.

The tools -the demacrofier- used to get rid of macros is hosted here

Ira Baxter's examples are very insightful in the way macros are used. However, the %age of those macros are very less \ref(An Empirical analysis of C Preprocessor use by Ernst et al.). Currently, I am focusing more on common cases.

Upvotes: 9

Related Questions