Reputation: 11337
I writing C# program that run over C++ source files and looking for the following things:
#define SOMETHING_A 99
and
typedef enum {
EX_A,
EX_B,
EX_C,
EX_D,
EX_E
} Examples;
and
enum EXAMPLE2
{
EX2_A=0,
EX2_B=1,
EX2_C=2,
EX2_D=3,
EX2_LAST = EX2_D
};
My objective is to get the following list of pairs as output:
{SOMETHING_A,99}
{EX_A,0}
{EX_B,1}
..
..
{EX2_A,0}
{EX2_B,1}
..
..
Can you help me to find the correct regular expressions that match the above 3 patterns?
Upvotes: 1
Views: 818
Reputation:
If you want a solution that will work on any c++ files, use a parser instead of regexes. There are just too many possibilities to account for (different code styles, code that is commented out, etc.).
If you only want to do this on a known set of files, and they have a predictable format and style, a regex is probably ok. Actually, you are better off using several regexes:
/^#define\s+(\S+)\s+(\S+)/
This only matches define statements that are at the beginning of a line.
Here is the typedef enum:
/^\s*typedef\s+enum\s*\{[^\}]+\}[^;]+;/
(It's not clear what you want to grab from this one, so I haven't captured anything).
And here is the enum. This is best done in two steps:
/^\s*enum\s+(\S+)\s*\{\s*([^\}]+?)\s*\}\s*;/
The first step gets the name of the enum in the first capture group and the content in the second group. Perform a regex on the second capture group to get the fields and values:
/(\S+)\s*=\s*([^\s\,]+)/
Each match of this will give you one name/value pair.
These regexes should handle your examples, and they should do a decent job of handling the most common usage in C++ code. But they are not perfect; if you want a solution that covers all possible constructs, don't use a regex.
note: you need to make sure the match_single_line
flag is off when using these.
Upvotes: 2