BigBug
BigBug

Reputation: 6290

Parsing C source file

If i have a C source file and i want to locate a specific local variable within a function and make it global - so another tool is able to process the C file (a tool i didn't write) what would be the easiest way to do this? I was thinking of using regex, but even that posses it's own problems. It's kind of like writing a mini C parser in Java.. a lot of work :S

Are there any libraries that can help make this easier?

For example, say i want to make the variable "i" into a global variable. The user will specify the function name and the variable name (but not the type the variable is - ie. "int").

I can use regex to find the function - sure. But from there i really don't know what the best approach would be?... Will CDT plugin help?

Example:

 /* 
  * add.c
  * a simple C program
  *
  */

#include <stdio.h>
#define LAST 10

int main()
{
    int i = 0;
    int sum = 0;

    for ( i = 1; i <= LAST; i++ ) {
      sum += i;
    } /*-for-*/
    printf("sum = %d\n", sum);

    return 0;
}

converted to:

 /* 
  * add.c
  * a simple C program
  *
  */

#include <stdio.h>
#define LAST 10

int i = 0;

int main()
{

    int sum = 0;

    for ( i = 1; i <= LAST; i++ ) {
      sum += i;
    } /*-for-*/
    printf("sum = %d\n", sum);

    return 0;
}

Upvotes: 1

Views: 683

Answers (3)

user207421
user207421

Reputation: 310885

The first thing I would demand is a complete specification of exactly when this is required and why, and how to identify when it is safe to do so without adversely affecting the program semantics. This is a really bad idea. Clearly those who gave you the assignment have no idea of either the implementation complexity, which is immense, or the adverse semantic effects. I am guessing that they will therefore be unable to come up with an adequate specification either, which will ultimately let you out.

I would also draw their attention to this discussion, especially Ira Baxter's comments. I used to build compilers for a living. It is not a task to learn, or ask about, on a forum.

Upvotes: 1

Ira Baxter
Ira Baxter

Reputation: 95334

If you do only trivial examples, you can hack this with Perl or some java regex. It won't work reliably on complex programs, because you need a real parser.

Our DMS Software Reengineering Toolkit and its C Front End could be used to to this pretty reliably.

DMS provides general purpose program analysis and transformation capability, parameterized by a programming langauge description. DMS's C Front explains to DMS what the precise syntax is for C (for a variety of dialects of C, including GCC and MS); it in effect provides a complete parser, producing Abstract Syntax trees (and the inverse: a C code generator from the ASTs) This allows DMS to read C source files accurately, including preprocessing.

With the parsed code in AST form, you can build DMS functions and/or write patterns to find function definitions and in particular your targeted variable. DMS code or alteratevely source-to-source transforms can then be used to either lift the variable out of the function, and/or insert code to track state changes of that variable so it can be seen.

So, with DMS and some custom code, you can achieve your desired effect. The example you provided is probably pretty simple to do with DMS, but the learning curve will stil be a lot; DMS is complex because the langauges it handles are complex, and you have to learn how to use it. So, this isn't an afternoon's exercise for a newbie.

Note: you will want to do this to preprocessed programs (otherwise you won't be generally able to parse them reliably). So, this should be something you do just before compilation, and shouldn't become part of the finalized code.

If you want to make permanent code changes, you'll need to parse the unpreprocessed code; that's a heckuva lot harder. DMS's C front end can do this to the extent the preprocessor directives are "structured"; about 95% of them are. So now you have a new problem: either fix the unstructured ones (a one time manual change), or reject files that can't be parsed with "tough luck".

You might use GCC instead of DMS; after all it has a very well tested C parser. It won't help you generate modified C code, though. Another alternative is Clang, which is coming up fast as a pretty good alternative. I think it will parse C++; not so sure about C or in particular the dialect of C your end user may be using (you didn't say). It has ASTs like DMS, and a kind of scheme for generating "patches" to code that might work.

Upvotes: 2

Johan Lundberg
Johan Lundberg

Reputation: 27028

Even if you are able to come up with a way to make such transformations, I think it's not a good idea. The program will not stay the same since you move around construction and destruction. Also, not all types are default constructable or copyable so in general the transformation is not possible.

Are you interested only in a few simple types? Then make that a part of the solution. Is the original code generated? Else, how can you trust to identify local objects by name only? The same name May also be used for different type of objects.

Upvotes: 1

Related Questions