Reputation: 435
I'm trying to write a regular expression that will give me only the returned type of any (see edit) C function in a C file, ignoring spaces and newlines, but I'm not having any luck with it. Edit: The returned types I have to consider are only basic C data types
Example:
signed
long long
int function1 ( int j, int n)
should give me:
signed long long int
How can I write (or think of a solution for) this regular expression?
Upvotes: 0
Views: 162
Reputation: 46375
The hardest part of the problem is probably answering the question: "how can I tell that I have reached the start of a function definition". Given the various rules of C, it's not clear that there is a "sure fire" answer - so the best you can probably do is come up with a rule that catches "most" situations.
Function definitions will have
void, signed, unsigned, short, long, char, int, float, double, *)
This means something like this should work: (demo: http://regex101.com/r/oJ3xS5 )
((?:(?:void|unsigned|signed|long|short|float|double|int|char|\*)(?:\s*))+)(\w+)\s*\(
Note - this does not "clean up the formatting" - so a return value definition that spans multiple lines will still do so. It does have the advantage (compared to other solutions) that it looks specifically for the basic types that are defined in the link in your question.
Also note - you need the g
flag to capture all the instances; and I capture the function name itself in its own capturing group (\w+)
. If you don't want / need that, you can leave out the parentheses. But I thought that having both the return type and the function name might be useful.
Afterthought: if you first strip out multiple white spaces and returns, the above will still work but now there will be no extraneous white space in the return value. For instance you could run your code through
cat source.c | tr '\n' ' ' | sed 's/\s+/ /' > strippedSource.c
then process with the regex above.
Upvotes: 1
Reputation: 22457
Concatenate all words using the OR operator:
\b((void|unsigned|signed|char|short|int|long|float|double)\s*)+\b
The \b
at start and end are to prevent partial function names popping up (void longjmp
comes to mind).
This will not catch typedefs such as uchar_8
, or complicated pointer-to-pointer constructions such as void (* int)(*)
(I just made this up, it may not mean anything).
Upvotes: 0