Reputation: 5471
I am trying to use RegEx to get blocks of data from a multi-line string.
String to search
***** a.txt 17=xxx 570=N 55=yyy ***** b.TXT 17=XXX 570=Y 55=yyy ***** ***** a.txt 38=10500.000000 711=1 311=0000000006630265 ***** b.TXT 38=10500.000000 311=0000000006630265 *****
What I need - anything between ***** block
17=xxx 570=N 55=yyy 17=XXX 570=Y 55=yyy 38=10500.000000 711=1 311=0000000006630265 38=10500.000000 311=0000000006630265
My code so far
Set objRegEx = CreateObject("VBScript.RegExp") objRegEx.Global = True objRegEx.MultiLine = True objRegEx.IgnoreCase = True objRegEx.Pattern = "\*\*\*\*\*(?:.|\n|\r)*?\*\*\*\*\*" Set strMatches = objRegEx.Execute(objExec.StdOut.ReadAll()) If strMatches.Count > 0 Then For Each strMatch In strMatches Wscript.Echo strMatch Next End If Set objRegEx = Nothing
Upvotes: 1
Views: 1700
Reputation: 626748
You need to turn the last *
matching part of your consuming pattern into a positive lookahead. Also, it is highly recommendable to get rid of the (.|\r|\n)*?
since it slows down the matching process, use [\s\S]*?
instead.
Use
\*{5}(?!\s*\*{5}).*[\r\n]+([\s\S]*?)(?=\*{5})
and grab the first item in Submatches
. With .*[\r\n]+
, I advise to skip the rest of the *****
starting line.
Details:
\*{5}
- 5 asterisks(?!\s*\*{5})
- fail the match if there are 0+ whitespaces followed with 5 asterisks.*[\r\n]+
- match the rest of the line with line breaks([\s\S]*?)
- Capturing group 1 (its value is stored in Submatches
property of the Match object) matching any 0+ chars as few as posssible up to the first....(?=\*{5})
- location followed with 5 asterisks that are not consumed, just their presence is checked.See the regex demo
If you unroll the regex, it will look uglier, but it is much more efficient:
\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)
VBS code:
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.Pattern = "\*{5}(?!\s*\*{5}).*[\r\n]+([^*]*(?:\*(?!\*{4})[^*]*)*)"
Set strMatches = objRegEx.Execute(objExec.StdOut.ReadAll())
If strMatches.Count > 0 Then
For Each strMatch In strMatches
Wscript.Echo strMatch.Submatches(0)
Next
End If
Set objRegEx = Nothing
Upvotes: 3
Reputation: 70923
Just capture the sets of consecutive numbered lines
Option Explicit
Dim data
With WScript.CreateObject("WScript.Shell")
data = .Exec("fc.exe /n 1.txt 2.txt").StdOut.ReadAll()
End With
Dim match
With New RegExp
.Pattern = "(?:^[ ]*[0-9].*?$[\r\n]+)+"
.Global = True
.MultiLine = True
For Each match in .Execute( data )
WScript.StdOut.WriteLine "---------------------------------------"
WScript.StdOut.WriteLine match.Value
Next
End With
Upvotes: 2