Dan McClain
Dan McClain

Reputation: 11920

Using C# and regex to parse source code and find function calls with arguments

I have a list of function calls stored in a database, and for some function calls, I care about what the arguments of the function call are. I am parsing C source code with my program (which is in C#). I'm trying to find the best way of getting the function calls with the arguments. I read the source code into a string prior to parsing it (so that I am not using the stream reader on the file). I tried using some regex (which is somewhat new to me) to parse the source file, but was retrieving more than just the function call when using a regex string like this: functionCall + ".*\\)"; ( I am escaping the opening ( in the function call)

The function calls are stored in the following format in the DB

Function Call
============
some_Call(

There is a reason they are stored this way, and will not change.

Is there a good way to do this through regex, or would I be better suited to walk through the source code contents?

Let me know if any clarification is needed.

Upvotes: 2

Views: 8782

Answers (3)

Daniel LeCheminant
Daniel LeCheminant

Reputation: 51081

Part of the reason your solution failed is that you probably should have used .*?), instead of greedy matching.

A complete answer would have to follow at least these:

Ignore parenthesis in strings and chars (which you can do with a regex, although with escaping it can be a little complicated)

functionCall("\")", ')')

Ignore parentheses in comments (which you can do with a regex)

functionCall(/*)*/ 1, // )
2)

Don't match too much (which you can do with a regex)

functionCall(1) + functionCall(2) + (2 * 3) // Don't match past the first )

but it would also have to ignore balanced parentheses

functionCall((1+(1))*(2+2))

This last one is something you can't do with a normal regex, because it involves counting parenthesis, and is generally something that regexs aren't suited for. However, it appears that .NET has ways to do this.

(And technically you would have to handle macros, I can imagine a

#define close_paren )

would ruin your day...)

That said, you could likely come up with a naive solution (similar to what you had, or what some other poster recommends) and it would work for many cases, especially if you're working with known inputs.

Upvotes: 6

chakrit
chakrit

Reputation: 61518

Not to deteriorate you but... in C, I believe (vaguely) that you can do this:

void secondFunction() { /* no-op */ }

void firstFunction()
{
    void* xyz = secondFunction;

    xyz(); // this should call secondFunction
}

Is that a possible scenario? And what about other variants of pointer usages?!?

Say, type casting functional-style?!?

int a;
float b = float(a); // call to the "float" function?!? NO! it's a type casting

Use a list of predefined types? What if the conversion was to a custom structs and what about typedefs? Now you'd have to parse those too!

Seriously, use a parser!! There're several available options already that could parse C.

I think Regex is a rather bad tool for the job.

Upvotes: 0

REA_ANDREW
REA_ANDREW

Reputation: 10764

I have written a quick regex and tested it, check the following:

            string tst = "some_function(type<whatever> tesxt_112,type<whatever> tesxt_113){";

        Regex r = new Regex(".*\\((.*)\\)");
        Match m = r.Match(tst);
        if (m.Success)
        {
            string[] arguments = m.Groups[1].Value.Split(',');
            for (int i = 0; i < arguments.Length; i++)
            {
                Console.WriteLine("Argument " + (i + 1) + " = " + arguments[i]);
            }
        }

        Console.ReadKey();

So the output for the above string would be:

Argument 1 = type<whatever> tesxt_112

Argument 2 = type<whatever> tesxt_113

Hope this helps:

Andrew :-)

Upvotes: 1

Related Questions