ItayB
ItayB

Reputation: 11337

Using C# Regex to find C++ code patterns (defines & emuns)

I writing C# program that run over C++ source files and looking for the following things:

 #define SOMETHING_A    99

and

typedef enum {
  EX_A,
  EX_B,
  EX_C,
  EX_D,
  EX_E
} Examples;

and

enum EXAMPLE2
{
    EX2_A=0,
    EX2_B=1,
    EX2_C=2,
    EX2_D=3,
    EX2_LAST = EX2_D
};

My objective is to get the following list of pairs as output:

{SOMETHING_A,99}
{EX_A,0}
{EX_B,1}
..
..
{EX2_A,0}
{EX2_B,1}
..
..

Can you help me to find the correct regular expressions that match the above 3 patterns?

Upvotes: 1

Views: 818

Answers (1)

user1919238
user1919238

Reputation:

If you want a solution that will work on any c++ files, use a parser instead of regexes. There are just too many possibilities to account for (different code styles, code that is commented out, etc.).

If you only want to do this on a known set of files, and they have a predictable format and style, a regex is probably ok. Actually, you are better off using several regexes:

/^#define\s+(\S+)\s+(\S+)/

This only matches define statements that are at the beginning of a line.

Here is the typedef enum:

/^\s*typedef\s+enum\s*\{[^\}]+\}[^;]+;/

(It's not clear what you want to grab from this one, so I haven't captured anything).

And here is the enum. This is best done in two steps:

/^\s*enum\s+(\S+)\s*\{\s*([^\}]+?)\s*\}\s*;/

The first step gets the name of the enum in the first capture group and the content in the second group. Perform a regex on the second capture group to get the fields and values:

/(\S+)\s*=\s*([^\s\,]+)/

Each match of this will give you one name/value pair.

These regexes should handle your examples, and they should do a decent job of handling the most common usage in C++ code. But they are not perfect; if you want a solution that covers all possible constructs, don't use a regex.

note: you need to make sure the match_single_line flag is off when using these.

Upvotes: 2

Related Questions