Reputation: 1571
I have a structured file with hierarchical text which describes a GUI in Delphi (a DFM-File).
Let's assume I have this file and I have to match all "Color = xxx" Lines, which are in the context of TmyButton (marked), but not those in other context. Within the TMyButton-Context there won't be a deeper hierarchical level.
object frmMain: TfrmMain
Left = 311
Top = 201
Color = clBtnFace
object MyFirstButton: TMyButton
Left = 555
Top = 301
Color = 16645072 <<<<<<MATCH THIS
OnClick = ButtonClick
end
object MyLabel: TLabel
Left = 362
Top = 224
Caption = 'a Caption'
Color = 16772831
Font.Color = clWindowText
end
object Panel2: TLTPanel
Left = 348
Top = 58
Width = 444
Height = 155
Color = clRed
object MyOtherButton: TMyButton
Left = 555
Top = 301
Color = 16645072 <<<<<<MATCH THIS
OnClick = ButtonClick
end
end
end
I tried it two days long with many, many different tries. Here some of my incomplete pieces of the pattern:
/^[ ]{2,}object [A-Za-z0-9]+: TmyButton\r\n/mi <<<Matches the needed context
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/mi <<<Matches the needed result
/^[ ]{2,}end\r\n/mi <<<Matches the end of the context
(I don't know why, but I had to use "\r\n" instead of "$"...). I need to put this together, but ignoring the other lines except other "object xxx: yyy" and "end" Lines....
I would be glad to have some help!
Upvotes: 1
Views: 2448
Reputation: 13203
I know this is not PCRE, but a good alternative for software archeology.
You could at any time use AWK, if you do this from a command prompt. The script would look like this:
BEGIN { inObj = 0; } // Not really necessary
/TMyButton/ { inObj = 1; }
/end$/ { inObj = 0; }
/^[ ]{4,}Color = [A-Za-z0-9]+\r\n/ && inObj == 1
{ //do whatever you need to do
print $3;
}
AWK can be found all over the internet. I would try GAWK.
Upvotes: 1
Reputation: 336408
Matching a line in a complex context requires a regex feature called lookaround, if you want or have to do it with a single regex. Specifically, you'd need variable-length lookbehind which PCRE doesn't offer.
So there are two possibilities: Use a scripting approach like Rorick suggested or use a regex that matches everything from the start of your needed context until the actual match, and extract that using a capturing group. That could be done with
[ ]{2,}object \w+: TMyButton\r\n.*?^([ ]{4,}Color = \w+[ \t]*\r\n)
(brackets around the space inserted for clarity). Your match would then be in capturing group \1
Nested structures generally are not well suited for regexes (better for parsers) but if you're sure of the structure of your data as you mentioned, it might work OK.
Upvotes: 1
Reputation: 8953
If I understand you correctly, you try to create single regexp for this. There is no reason to do so.
object [A-Za-z0-9]+: TmyButton
Color = [A-Za-z0-9]+
until you find it or reach end
keyword. If you try to modify a bulk of source files, you could use some scripting for this purpose.
Upvotes: 1