edvas
edvas

Reputation: 435

Regex: Returned type of C functions

I'm trying to write a regular expression that will give me only the returned type of any (see edit) C function in a C file, ignoring spaces and newlines, but I'm not having any luck with it. Edit: The returned types I have to consider are only basic C data types

Example:

 signed    
     long    long 
    int function1   ( int j, int n)

should give me:

signed long long int

How can I write (or think of a solution for) this regular expression?

Upvotes: 0

Views: 162

Answers (2)

Floris
Floris

Reputation: 46375

The hardest part of the problem is probably answering the question: "how can I tell that I have reached the start of a function definition". Given the various rules of C, it's not clear that there is a "sure fire" answer - so the best you can probably do is come up with a rule that catches "most" situations.

Function definitions will have

  • A return type with possible qualifier (one or more of void, signed, unsigned, short, long, char, int, float, double, *)
  • Followed by a word
  • Followed by an open parenthesis.

This means something like this should work: (demo: http://regex101.com/r/oJ3xS5 )

((?:(?:void|unsigned|signed|long|short|float|double|int|char|\*)(?:\s*))+)(\w+)\s*\(

Note - this does not "clean up the formatting" - so a return value definition that spans multiple lines will still do so. It does have the advantage (compared to other solutions) that it looks specifically for the basic types that are defined in the link in your question.

Also note - you need the g flag to capture all the instances; and I capture the function name itself in its own capturing group (\w+). If you don't want / need that, you can leave out the parentheses. But I thought that having both the return type and the function name might be useful.

Afterthought: if you first strip out multiple white spaces and returns, the above will still work but now there will be no extraneous white space in the return value. For instance you could run your code through

cat source.c | tr '\n' ' ' | sed 's/\s+/ /' > strippedSource.c

then process with the regex above.

Upvotes: 1

Jongware
Jongware

Reputation: 22457

Concatenate all words using the OR operator:

\b((void|unsigned|signed|char|short|int|long|float|double)\s*)+\b

The \b at start and end are to prevent partial function names popping up (void longjmp comes to mind).

This will not catch typedefs such as uchar_8, or complicated pointer-to-pointer constructions such as void (* int)(*) (I just made this up, it may not mean anything).

Upvotes: 0

Related Questions