Reputation: 68740
I'm somewhere on the learning curve when it comes to regular expressions, and I need to use them to automatically modify function prototypes in a bunch of C headers. Does anyone know of a decent regular expression to find any and all function prototypes in a C header, while excluding everything else?
Edit: Three things that weren't clear initially:
Upvotes: 27
Views: 28742
Reputation: 54
A one liner regex sounds very hard. I personally use a perl script for that. It's kind of easy. The basic approach is:
Upvotes: 0
Reputation: 126
Based on 9999years solution above I developed for my reasons the following regex which sorts return types, function names and arguments to groups up to the first open curly bracket, regardless whether occuring
^((\w+[\(\w+\)]([ |\t]+)?(\R)?){2,})(\([^!@#;$+%^]+?([ |\t]+)?(\R)?\))(( +)?([ |\t]+)?(\R)?\{)
^((\w+[\(\w+\)]([ |\t|\r]+)?([\r])?){2,})(\([^!@#;$+%^]+?([ |\t|\r]+)?\))(( +)?([ |\t]+)?[^>](\r)?\{)
Upvotes: 0
Reputation: 3780
As continue of the great Dean TH answer
This will find
^([\w\*]+( )*?){2,}\(([^!@#$+%^;]+?)\)(?!\s*;)
Upvotes: 0
Reputation: 51
Here's a regular expression that's a good starting point for finding C function names:
^\s*(?:(?:inline|static)\s+){0,2}(?!else|typedef|return)\w+\s+\*?\s*(\w+)\s*\([^0]+\)\s*;?
And these are some test cases to validate the expression:
// good cases
static BCB_T *UsbpBufCtrlRemoveBack (BCB_Q_T *pBufCtrl);
inline static AT91_REG *UDP_EpIER (UDP_ENDPOINT_T *pEndpnt);
int UsbpEnablePort (USBP_CTRL_T *pCtrl)
bool_t IsHostConnected(void)
inline AT91_REG *UDP_EpCSR (UDP_ENDPOINT_T *pEndpnt)
// shouldn't match
typedef void (*pfXferCB)(void *pEndpnt, uint16_t Status);
else if (bIsNulCnt && bIsBusyCnt)
return UsbpDump(Buffer, BufSize, Option);
Finally, here's a simple TCL script to read through a file and extract all the function prototypes and function names.
set fh [open "usbp.c" r]
set contents [read $fh]
close $fh
set fileLines [split $contents \n]
set lineNum 0
set funcCount 0
set funcRegexp {^\s*(?:(?:inline|static)\s+){0,2}(?!else|typedef|return)\w+\s+\*?\s*(\w+)\s*\([^0]+\)\s*;?}
foreach line $fileLines {
incr lineNum
if {[regexp $funcRegexp $line -> funcName]} {
puts "line:$lineNum, $funcName"
incr funcCount
}; #end if
}; #end foreach
puts "$funcCount functions found."
Upvotes: 5
Reputation: 1653
Assuming your code is formatted something like
type name function_name(variables **here, variables &here)
{
code
}
Here’s a one-liner for Powershell:
ls *.c, *.h | sls "^(\w+( )?){2,}\([^!@#$+%^]+?\)"
Which returns results like:
...
common.h:37:float max(float a, float b)
common.h:42:float fclamp(float val, float fmin, float fmax)
common.h:51:float lerp(float a, float b, float b_interp)
common.h:60:float scale(float val, float valmin, float valmax, float min,
float max)
complex.h:3:typedef struct complex {
complex.h:8:double complexabs(complex in)
complex.h:13:void complexmult(complex *out, complex a, complex b)
complex.h:20:void complexadd(complex *out, complex a, complex b)
complex.h:27:int mandlebrot(complex c, int i)
...
To see just the line without the file specifics, add format-table -property line
(or abbreviated as ft -p line
):
ls *.c, *.h | sls "^(\w+( )?){2,}\([^!@#$+%^]+?\)" | format-table -p line
Which returns:
Line
----
void render(SDL_Surface *screen)
void saveframe(SDL_Surface *screen)
int handleevents(SDL_Surface *screen)
int WinMain(/*int argc, char* args[]*/)
void printscreen(SDL_Surface *screen, unsigned int exclude)
void testsection(char name[])
void sdltests(SDL_Surface *screen, SDL_Window *window, int width, int height)
int WinMain(/*int argc, char *argv[]*/)
int random(int min, int max) {
int main(int argc, char *argv[])
BONUS: Explanation of the regex:
^(\w+(\s+)?){2,}\([^!@#$+%^]+?\)
^ Start of a line
( ){2,} Create atom to appear to or more times
(as many as possible)
\w+(\s+)? A group of word characters followed by
an optional space
\( \) Literal parenthesis containing
[^!@#$+%^]+? A group of 0 or more characters
that AREN'T in “!@#$+%^”
Upvotes: 5
Reputation: 1
Let's say you have the whole c file read into $buffer. * first create regexp that replaces all comments with equally number of spaces and linefeeds so that row and col positions won't change * create regexp that can handle parenthesised string * then regexp like this finds functions: (static|)\s+(\w+)\s*$parenthezized_regexp+*{
this reg exp does not handle functions which function definition uses preprocessor directives.
if you go for lex/yacc you have to combine ansi c and preprocessor grammars to handle those preprocessor directives inside function definitions
Upvotes: 0
Reputation: 14895
To do this properly, you'll need to parse according to the C language grammar. But if this is for the C language only and for header files only, perhaps you can take some shortcuts and get by without full blown BNF.
^
\s*
(unsigned|signed)?
\s+
(void|int|char|short|long|float|double) # return type
\s+
(\w+) # function name
\s*
\(
[^)]* # args - total cop out
\)
\s*
;
This is by no means correct, and needs work. But it could represent a starting point, if you're willing to put in some effort and improve it. It can be broken by function definitions that span lines, function pointer argument, MACROS and probably many other things.
Note that BNF can be converted to a regex. It will be a big, complex regex, but it's doable.
Upvotes: 12
Reputation: 753695
For a one-off exercise, you'd probably do best by starting simple and looking at the code you have to scan. Pick the three worst headers, generate a regex or series of regexes that do the job. You have to decide whether and how you are going deal with comments that contain function declarations (and, indeed, with function declarations that contain comments). Dealing with:
extern void (*function(int, void (*)(int)))(int);
(which could be the Standard C function signal()
) is tough in a regex because of the nested parentheses. If you don't have any such function prototypes, time spent working out how to deal with them is time wasted. Similar comments apply to pointers to multi-dimensional arrays. The chances are that you have stylistic conventions to simplify your life. You may not use C99 (C++) comments; you don't need to code around them. You probably don't put multiple declarations in a single line, either with or without a common type - so you don't have to deal with that.
extern int func1(int), func2(double); double func3(int); // Nasty!
Upvotes: 7