B Hart
B Hart

Reputation: 1118

RegEx Only Return matches if words are present between two words

I have a large Device Configuration File and I'm trying to use RegEx to parse out the relevant portions for further coding... The parts of the Config I'm trying to parse will start with the words "edit ServiceName ;mode" and will end with the word "exit" on its own line. This config file and string returned will be on multiple lines. I only want to return or match certain parts of this config file that contain certain Key Words...

Sub TestRegEx_1()
Dim TestString
Dim objRegEx, f_objResults, f_Match

TestString = "edit NonMatch1 ;mode" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "exit" & vbCrLf & _
    "edit NonMatch2 ;mode" & vbCrLf & _
    "Something Random" & vbCrLf & _
    "exit" & vbCrLf & _
    "edit GoodMatch1 ;mode" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "KeyWord_1 1 2 and 3" & vbCrLf & _
    "exit" & vbCrLf & _
    "edit GoodMatch2 ;mode" & vbCrLf & _
    "KeyWord_2 A B and C" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "exit" & vbCrLf & _
    "edit NonMatch3 ;mode" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "exit" & vbCrLf & _
    "edit GoodMatch3 ;mode" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "KeyWord_3 1A" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "exit"

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True
objRegEx.MultiLine = True
objRegEx.Global = True

objRegEx.Pattern = "^edit (.{0,}) \;mode[\s\S]*?" & _
 "(?=(KeyWord_1|KeyWord_2|KeyWord_3))[\s\S]*?exit$"

Set f_objResults = objRegEx.Execute(TestString)
For Each f_Match In f_objResults
    MsgBox f_Match.Value
Next
End Sub

Because RegEx is greedy the above routine will return a match containing parts that I do not want. I was able to split my routine into two separate RegEx pattern searches to get it to function properly but I would like to modify my initial pattern search so that I do not have to do this. The Below routine will create the output that I am looking for.

Sub TestRegEx_2()
Dim TestString
Dim objRegEx, f_objResults, f_Match

TestString = "edit NonMatch1 ;mode" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "exit" & vbCrLf & _
    "edit NonMatch2 ;mode" & vbCrLf & _
    "Something Random" & vbCrLf & _
    "exit" & vbCrLf & _
    "edit GoodMatch1 ;mode" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "KeyWord_1 1 2 and 3" & vbCrLf & _
    "exit" & vbCrLf & _
    "edit GoodMatch2 ;mode" & vbCrLf & _
    "KeyWord_2 A B and C" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "exit" & vbCrLf & _
    "edit NonMatch3 ;mode" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "exit" & vbCrLf & _
    "edit GoodMatch3 ;mode" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "KeyWord_3 1A" & vbCrLf & _
    "Something Random" & vbCrLf & "Something Random" & vbCrLf & _
    "exit"

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.IgnoreCase = True
objRegEx.MultiLine = True
objRegEx.Global = True

'This Works...
objRegEx.Pattern = "^edit (.{0,}) \;mode[\s\S]*?exit$"
Set f_objResults = objRegEx.Execute(TestString)

objRegEx.Pattern = "(?=(KeyWord_1|KeyWord_2|KeyWord_3))"
For Each f_Match In f_objResults
    If objRegEx.test(f_Match.Value) Then
        MsgBox f_Match.Value
    End If
Next

End Sub

What do I need to change on my initial pattern match to make this work without having to create separate RegEx Patterns? How do I explicitly tell the RegEx engine to stop after the first instance of "exit" so that if it does not find a match it will not continue to include additional strings until a match is found? Any help is greatly appreciated! Thank You.

EDIT: Added The parts from my test string that I'm wanting to be returned by the Match. The "GoodMatch" sections can contain one or more of the Keywords. I need to have the full section returned.

edit GoodMatch1 ;mode
Something Random
Something Random
KeyWord_1 1 2 and 3
exit

edit GoodMatch2 ;mode
KeyWord_2 A B and C
Something Random
Something Random
exit

edit GoodMatch3 ;mode
Something Random
Something Random
KeyWord_3 1A
Something Random
Something Random
exit

Upvotes: 1

Views: 1363

Answers (3)

Jerry
Jerry

Reputation: 71588

I'm not sure how your full config file is like, but you might try something like:

(KeyWord_1|KeyWord_2|KeyWord_3)(?=(?:(?!edit)[\s\S])*?exit)

This will match only within an 'edit ... exit' block.

Or:

(KeyWord_1|KeyWord_2|KeyWord_3)(?=(?:(?!edit[^;]+;mode )[\s\S])*?exit)

For a specific 'edit ... ;mode ... exit' block.

The lookahead is what forces the match to be within an 'edit ... exit' block, basically by making sure that there's no 'edit' until the next 'exit'. If you're within a block, there will be no 'edit' in between and so there's match. If you're outside, you're bound to hit 'edit' before 'exit' and hence, no match.


EDIT: To get the whole block, you can use:

edit(?=(?:(?!exit)[\S\s])*\b(KeyWord_1|KeyWord_2|KeyWord_3)\b)(?:(?!exit)[\S\s])*exit

The match itself is the block, the sub-matches are the keywords.

Upvotes: 4

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200483

Your regular expression is not greedy, but you've fallen victim to a common misunderstanding about non-greedy matches. Those do not produce the shortest possible match, but the match from the current cursor position to the next occurrence of the expression after the non-greedy (sub)expression.

Let's take a look at (part of) your test string:

edit NonMatch1 ;mode
Something Random
Something Random
exit
edit NonMatch2 ;mode
Something Random
exit
edit GoodMatch1 ;mode
Something Random
Something Random
KeyWord_1 1 2 and 3
exit
edit GoodMatch2 ;mode
KeyWord_2 A B and C
Something Random
Something Random
exit

What you want as the first match is this:

edit NonMatch1 ;mode
Something Random
Something Random
exit
edit NonMatch2 ;mode
Something Random
exit
edit GoodMatch1 ;mode
Something Random
Something Random
KeyWord_1 1 2 and 3
exit
edit GoodMatch2 ;mode
KeyWord_2 A B and C
Something Random
Something Random
exit

but what you actually get is this:

edit NonMatch1 ;mode
Something Random
Something Random
exit
edit NonMatch2 ;mode
Something Random
exit
edit GoodMatch1 ;mode
Something Random
Something Random
KeyWord_1 1 2 and 3
exit
edit GoodMatch2 ;mode
KeyWord_2 A B and C
Something Random
Something Random
exit

The reason for this is that when the regexp parser starts reading your string, the first line matches the first part of your expression (^edit (.{0,}) \;mode). The next part of the expresseion ([\s\S]*?(?=(KeyWord_1|KeyWord_2|KeyWord_3))) then matches everything from the line break at the end of that line up to the first occurrence of one of your three keywords, thus spanning several edit sections.

The simplest solution to your problem might be using a regexp to indiscriminately divide the string into edit sections, and then use a string match for selecting the ones you want:

testString = "..."

Set re = New RegExp
re.IgnoreCase = True
re.MultiLine  = True
re.Global     = True
re.Pattern    = "^edit (.*) \;mode[\s\S]*?exit$"

For Each m In re.Execute(testString)
  If InStr(m.Value, "KeyWord_1") > 0 Then
    'do some
  ElseIf InStr(m.Value, "KeyWord_2") > 0 Then
    'do other
  ElseIf InStr(m.Value, "KeyWord_3") > 0 Then
    'do something completely different
  End If
Next

Of course you could also use another regular expression inside the loop:

testString = "..."

Set re = New RegExp
re.IgnoreCase = True
re.MultiLine  = True
re.Global     = True
re.Pattern    = "^edit (.*) \;mode[\s\S]*?exit$"

Set keywords = New RegExp
keywords.IgnoreCase = True
keywords.Pattern    = "keyword_1|keyword_2|keyword_3"

For Each m In re.Execute(testString)
  If keywords.Test(m.Value) Then
    WScript.Echo m.Value
  End If
Next

Upvotes: 1

David Candy
David Candy

Reputation: 743

You need laziness which is the ?.

http://www.regular-expressions.info/repeat.html

Upvotes: 0

Related Questions