user2370532
user2370532

Reputation: 11

regular expression to match C structure

I would like a regular expression to match a C Structure define. This is my target data:

typedef struct
{
}dontMatchThis;

typedef struct
{
  union //lets have a union as well
  {
    struct 
    {
     int a
     //a comment for fun

     int b;
     int c;
    };
    char byte[10];
  };
}structA;

I want to match the define of structA only, from typedef to strunctA.

I have tried : typedef[\s\S]+?structA

But event though I'm using the non-greedy modifier this is matching both structures. Any suggestions

Upvotes: 1

Views: 3434

Answers (4)

antarcticus
antarcticus

Reputation: 1

As stated by ctn The problem with the non-greedy modifier as stated in your regex is that it starts looking for the first definition of typedef and will stop at the first place where it finds structA. Everything in between is considered as valid. A way to use regex to solve your problem is to define a regex which identifies the structs, and later in a separate stage you verify if the match corresponds to the struct that you want.

For example, using the regex:

(typedef[\s\S]+?})\s*([a-zA-Z0-9_]+)\s*;

you will define 2 groups, where the first starts at a typedef and ends at a curly brace, with a non-greedy text matching. This first group contains the string that you might want. The final curly brace is followed by the struct name ([a-zA-Z0-9_]+) and ends with ;. Considering your example, there will be 2 matches, each containing 2 groups.

Match 1:

(typedef struct
{
})(dontMatchThis);

Value of group 2: dontMatchThis

Match 2:

(typedef struct
{
  union //lets have a union as well
  {
    struct 
    {
     int a
     //a comment for fun

     int b;
     int c;
    };
    char byte[10];
  };
})(structA);

Value of group 2: structA

Thus, it becomes a matter of verifying if the value of the group 2 corresponds to structA.

Upvotes: 0

user2370532
user2370532

Reputation: 11

I found the following works for me:

([\s\S])(typedef([\s\S])?structA)

I then select the second group, which has my structure in. This uses the first [\s\S] as a greedy operator to match all the defines before the target struct.

Upvotes: 0

ctn
ctn

Reputation: 2930

The problem is the point where the regexp begins matching. It correctly starts matching at the first typedef and continues until structA.

It's really difficult (I would say impossible to do correctly) what you're trying to do. You would need to match nested braces to see where the struct stops.

See Building a Regex Based Parser.

Upvotes: 1

In the general case, it is simply not possible. The typedef or the struct could have been generated by preprocessor macro invocations (and you could have typedef in one file, and struct in another #include-d file, or struct coming from one preprocessor macro, and typedef from another one.).

I would suggest instead to extend or customize the GCC compiler, either thru a plugin or a MELT extension (MELT is a domain specific language to extend GCC).

See also etags

Upvotes: 1

Related Questions