Pankaj Jaju
Pankaj Jaju

Reputation: 5471

VBScript RegEx - Find block of data between a pattern

I am trying to use RegEx to get blocks of data from a multi-line string.

String to search

***** a.txt
17=xxx
570=N
55=yyy
***** b.TXT
17=XXX
570=Y
55=yyy
*****

***** a.txt
38=10500.000000
711=1
311=0000000006630265
***** b.TXT
38=10500.000000
311=0000000006630265
*****

What I need - anything between ***** block

17=xxx
570=N
55=yyy

17=XXX
570=Y
55=yyy

38=10500.000000
711=1
311=0000000006630265

38=10500.000000
311=0000000006630265

My code so far

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.MultiLine = True
objRegEx.IgnoreCase = True
objRegEx.Pattern = "\*\*\*\*\*(?:.|\n|\r)*?\*\*\*\*\*"
Set strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())
If strMatches.Count > 0 Then
    For Each strMatch In strMatches
        Wscript.Echo strMatch
    Next
End If
Set objRegEx = Nothing

Upvotes: 1

Views: 1700

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

You need to turn the last * matching part of your consuming pattern into a positive lookahead. Also, it is highly recommendable to get rid of the (.|\r|\n)*? since it slows down the matching process, use [\s\S]*? instead.

Use

\*{5}(?!\s*\*{5}).*[\r\n]+([\s\S]*?)(?=\*{5})

and grab the first item in Submatches. With .*[\r\n]+, I advise to skip the rest of the ***** starting line.

Details:

  • \*{5} - 5 asterisks
  • (?!\s*\*{5}) - fail the match if there are 0+ whitespaces followed with 5 asterisks
  • .*[\r\n]+ - match the rest of the line with line breaks
  • ([\s\S]*?) - Capturing group 1 (its value is stored in Submatches property of the Match object) matching any 0+ chars as few as posssible up to the first....
  • (?=\*{5}) - location followed with 5 asterisks that are not consumed, just their presence is checked.

See the regex demo

If you unroll the regex, it will look uglier, but it is much more efficient:

\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)

See another regex demo

VBS code:

Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.Pattern = "\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)"
Set strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())
If strMatches.Count > 0 Then
    For Each strMatch In strMatches
        Wscript.Echo strMatch.Submatches(0)
    Next
End If
Set objRegEx = Nothing

Upvotes: 3

MC ND
MC ND

Reputation: 70923

Just capture the sets of consecutive numbered lines

Option Explicit

Dim data
    With WScript.CreateObject("WScript.Shell")
        data = .Exec("fc.exe /n 1.txt 2.txt").StdOut.ReadAll()
    End With 

Dim match
    With New RegExp
        .Pattern = "(?:^[ ]*[0-9].*?$[\r\n]+)+"
        .Global = True
        .MultiLine = True
        For Each match in .Execute( data )
            WScript.StdOut.WriteLine "---------------------------------------"
            WScript.StdOut.WriteLine match.Value
        Next 
    End With 

Upvotes: 2

Related Questions