Yoram Abargel
Yoram Abargel

Reputation: 79

Finding functions within code using Regex python

I am trying to extract all the functions I have in a file using REGEX. Here's a standard file example:

int main()
{
    printf("hello to all the good people");
    printf("hello to all the good people %d ", GetLastError());

    for(int i =0; i<15; i++)
    {
        if(i == 5)
        {
            switch(i)
            {
                case 0:
                    break; 
            }
        }
    }
}

In the meantime, I have only succeeded in capturing functions using the following REGEX:

regex = re.findall('\w+\s*?[(].*[)]', _content) #'\w+\s*?[(]*[)]'
for i in regex:
    print i

My problems are:

  1. How can I tell him to ignore things like FOR or SWITCH?
  2. How do I tell him to find an internal function inside an externally as an example:

printf ("%s", get_string());

  1. How do I make it not to relate to () that are between quotes as () that aren't between quotes (so if i have line: printf("hello to j. (and rona) %s", get_family_name()); he will know to extract:

    foo name: parameters: printf "hello to j. (and rona) %s", get_family_name() get_family_name none

Upvotes: 1

Views: 52

Answers (1)

ash
ash

Reputation: 5549

You cannot parse C using regular expressions.

There is another question about parsing HTML with regex; the answer given there applies also to C, and to essentially any useful programming language.

The pycparser library looks like it might be useful, particularly the func_calls example – in fact, I think the following snippet (adapted from that example) will do exactly what you want, although I haven't tested it:

from pycparser import c_ast, parse_file

class FuncCallVisitor(c_ast.NodeVisitor):
    def visit_FuncCall(self, node):
        print("{} called at {}".format(node.name.name, node.name.coord))

ast = parse_file("myfile.c", use_cpp=True)
v = FuncCallVisitor()
v.visit(ast)

Upvotes: 1

Related Questions