Reputation: 11
I would like a regular expression to match a C Structure define. This is my target data:
typedef struct
{
}dontMatchThis;
typedef struct
{
union //lets have a union as well
{
struct
{
int a
//a comment for fun
int b;
int c;
};
char byte[10];
};
}structA;
I want to match the define of structA only, from typedef to strunctA.
I have tried :
typedef[\s\S]+?structA
But event though I'm using the non-greedy modifier this is matching both structures. Any suggestions
Upvotes: 1
Views: 3434
Reputation: 1
As stated by ctn The problem with the non-greedy modifier as stated in your regex is that it starts looking for the first definition of typedef
and will stop at the first place where it finds structA
. Everything in between is considered as valid. A way to use regex to solve your problem is to define a regex which identifies the structs, and later in a separate stage you verify if the match corresponds to the struct that you want.
For example, using the regex:
(typedef[\s\S]+?})\s*([a-zA-Z0-9_]+)\s*;
you will define 2 groups, where the first starts at a typedef
and ends at a curly brace, with a non-greedy text matching. This first group contains the string that you might want. The final curly brace is followed by the struct name ([a-zA-Z0-9_]+)
and ends with ;
. Considering your example, there will be 2 matches, each containing 2 groups.
Match 1:
(typedef struct
{
})(dontMatchThis);
Value of group 2: dontMatchThis
Match 2:
(typedef struct
{
union //lets have a union as well
{
struct
{
int a
//a comment for fun
int b;
int c;
};
char byte[10];
};
})(structA);
Value of group 2: structA
Thus, it becomes a matter of verifying if the value of the group 2 corresponds to structA.
Upvotes: 0
Reputation: 11
I found the following works for me:
([\s\S])(typedef([\s\S])?structA)
I then select the second group, which has my structure in. This uses the first [\s\S] as a greedy operator to match all the defines before the target struct.
Upvotes: 0
Reputation: 2930
The problem is the point where the regexp begins matching. It correctly starts matching at the first typedef and continues until structA.
It's really difficult (I would say impossible to do correctly) what you're trying to do. You would need to match nested braces to see where the struct stops.
See Building a Regex Based Parser.
Upvotes: 1
Reputation: 1
In the general case, it is simply not possible. The typedef
or the struct
could have been generated by preprocessor macro invocations (and you could have typedef
in one file, and struct
in another #include
-d file, or struct
coming from one preprocessor macro, and typedef
from another one.).
I would suggest instead to extend or customize the GCC compiler, either thru a plugin or a MELT extension (MELT is a domain specific language to extend GCC).
See also etags
Upvotes: 1