dsimcha
dsimcha

Reputation: 68740

Regex to pull out C function prototype declarations?

I'm somewhere on the learning curve when it comes to regular expressions, and I need to use them to automatically modify function prototypes in a bunch of C headers. Does anyone know of a decent regular expression to find any and all function prototypes in a C header, while excluding everything else?

Edit: Three things that weren't clear initially:

  1. I do not care about C++, only straight C. This means no templates, etc. to worry about.
  2. The solution must work with typedefs and structs, no limiting to only basic C types.
  3. This is kind of a one-off thing. It does not need to be pretty. I do not care how much of a kludge it is as long as it works, but I don't want a complex, hard to implement solution.

Upvotes: 27

Views: 28742

Answers (9)

marc
marc

Reputation: 54

A one liner regex sounds very hard. I personally use a perl script for that. It's kind of easy. The basic approach is:

  1. Call your favorite c preprocessor to eliminate comments and get macros expanded. (so it's easier)
  2. Count '{' '}' symbols. For functions in plain C they have a predictable behavior that will allow you to detect function names.
  3. Look the function names into the original source (before preprocessing to get the signature that has typedefs) It is an inefficiency approach but it works quite well for me. Step 1 is not really necessary but it will make your life easier

Upvotes: 0

pedda
pedda

Reputation: 126

Based on 9999years solution above I developed for my reasons the following regex which sorts return types, function names and arguments to groups up to the first open curly bracket, regardless whether occuring

  • tabs
  • spaces
  • line feeds (carriage returns) happen at logical possible places (PCRE dialect):
^((\w+[\(\w+\)]([ |\t]+)?(\R)?){2,})(\([^!@#;$+%^]+?([ |\t]+)?(\R)?\))(( +)?([ |\t]+)?(\R)?\{)
  • in .NET (C#):
^((\w+[\(\w+\)]([ |\t|\r]+)?([\r])?){2,})(\([^!@#;$+%^]+?([ |\t|\r]+)?\))(( +)?([ |\t]+)?[^>](\r)?\{)

Upvotes: 0

Raz Luvaton
Raz Luvaton

Reputation: 3780

As continue of the great Dean TH answer

This will find

  • Only functions and not the declaration too
  • And Function that returns pointers

^([\w\*]+( )*?){2,}\(([^!@#$+%^;]+?)\)(?!\s*;)

Upvotes: 0

Dean TH
Dean TH

Reputation: 51

Here's a regular expression that's a good starting point for finding C function names:

^\s*(?:(?:inline|static)\s+){0,2}(?!else|typedef|return)\w+\s+\*?\s*(\w+)\s*\([^0]+\)\s*;?

And these are some test cases to validate the expression:

// good cases
static BCB_T   *UsbpBufCtrlRemoveBack   (BCB_Q_T *pBufCtrl);
inline static AT91_REG *UDP_EpIER               (UDP_ENDPOINT_T *pEndpnt);
int UsbpEnablePort (USBP_CTRL_T *pCtrl)
bool_t IsHostConnected(void)
inline AT91_REG *UDP_EpCSR (UDP_ENDPOINT_T *pEndpnt)

// shouldn't match
typedef void (*pfXferCB)(void *pEndpnt, uint16_t Status);
    else if (bIsNulCnt && bIsBusyCnt)
            return UsbpDump(Buffer, BufSize, Option);

Finally, here's a simple TCL script to read through a file and extract all the function prototypes and function names.

set fh [open "usbp.c" r]
set contents [read $fh]
close $fh
set fileLines [split $contents \n]
set lineNum 0
set funcCount 0
set funcRegexp {^\s*(?:(?:inline|static)\s+){0,2}(?!else|typedef|return)\w+\s+\*?\s*(\w+)\s*\([^0]+\)\s*;?}
foreach line $fileLines {
    incr lineNum
    if {[regexp $funcRegexp $line -> funcName]} {
        puts "line:$lineNum, $funcName"
        incr funcCount
    }; #end if

}; #end foreach
puts "$funcCount functions found."

Upvotes: 5

9999years
9999years

Reputation: 1653

Assuming your code is formatted something like

type name function_name(variables **here, variables &here)
{
    code
}

Here’s a one-liner for Powershell:

ls *.c, *.h | sls "^(\w+( )?){2,}\([^!@#$+%^]+?\)"

Which returns results like:

...
common.h:37:float max(float a, float b)
common.h:42:float fclamp(float val, float fmin, float fmax)
common.h:51:float lerp(float a, float b, float b_interp)
common.h:60:float scale(float val, float valmin, float valmax, float min,
float max)
complex.h:3:typedef struct complex {
complex.h:8:double complexabs(complex in)
complex.h:13:void complexmult(complex *out, complex a, complex b)
complex.h:20:void complexadd(complex *out, complex a, complex b)
complex.h:27:int mandlebrot(complex c, int i)
...

To see just the line without the file specifics, add format-table -property line (or abbreviated as ft -p line):

ls *.c, *.h | sls "^(\w+( )?){2,}\([^!@#$+%^]+?\)" | format-table -p line

Which returns:

Line
----
void render(SDL_Surface *screen)
void saveframe(SDL_Surface *screen)
int handleevents(SDL_Surface *screen)
int WinMain(/*int argc, char* args[]*/)
void printscreen(SDL_Surface *screen, unsigned int exclude)
void testsection(char name[])
void sdltests(SDL_Surface *screen, SDL_Window *window, int width, int height)
int WinMain(/*int argc, char *argv[]*/)
int random(int min, int max) {
int main(int argc, char *argv[])

BONUS: Explanation of the regex:

^(\w+(\s+)?){2,}\([^!@#$+%^]+?\)
^                                Start of a line
 (         ){2,}                 Create atom to appear to or more times
                                 (as many as possible)
  \w+(\s+)?                      A group of word characters followed by
                                 an optional space
                \(            \) Literal parenthesis containing
                  [^!@#$+%^]+?   A group of 0 or more characters
                                 that AREN'T in “!@#$+%^”

Upvotes: 5

ihu
ihu

Reputation: 1

Let's say you have the whole c file read into $buffer. * first create regexp that replaces all comments with equally number of spaces and linefeeds so that row and col positions won't change * create regexp that can handle parenthesised string * then regexp like this finds functions: (static|)\s+(\w+)\s*$parenthezized_regexp+*{

this reg exp does not handle functions which function definition uses preprocessor directives.

if you go for lex/yacc you have to combine ansi c and preprocessor grammars to handle those preprocessor directives inside function definitions

Upvotes: 0

Paul Beckingham
Paul Beckingham

Reputation: 14895

To do this properly, you'll need to parse according to the C language grammar. But if this is for the C language only and for header files only, perhaps you can take some shortcuts and get by without full blown BNF.

^
\s*
(unsigned|signed)?
\s+
(void|int|char|short|long|float|double)  # return type
\s+
(\w+)                                    # function name
\s*
\(
[^)]*                                    # args - total cop out
\)
\s*
;

This is by no means correct, and needs work. But it could represent a starting point, if you're willing to put in some effort and improve it. It can be broken by function definitions that span lines, function pointer argument, MACROS and probably many other things.

Note that BNF can be converted to a regex. It will be a big, complex regex, but it's doable.

Upvotes: 12

Jonathan Leffler
Jonathan Leffler

Reputation: 753695

For a one-off exercise, you'd probably do best by starting simple and looking at the code you have to scan. Pick the three worst headers, generate a regex or series of regexes that do the job. You have to decide whether and how you are going deal with comments that contain function declarations (and, indeed, with function declarations that contain comments). Dealing with:

extern void (*function(int, void (*)(int)))(int);

(which could be the Standard C function signal()) is tough in a regex because of the nested parentheses. If you don't have any such function prototypes, time spent working out how to deal with them is time wasted. Similar comments apply to pointers to multi-dimensional arrays. The chances are that you have stylistic conventions to simplify your life. You may not use C99 (C++) comments; you don't need to code around them. You probably don't put multiple declarations in a single line, either with or without a common type - so you don't have to deal with that.

extern int func1(int), func2(double); double func3(int);  // Nasty!

Upvotes: 7

Quassnoi
Quassnoi

Reputation: 425341

You may implement a parser using ANSI C yacc/lex grammar.

Upvotes: 18

Related Questions