user1502755
user1502755

Reputation: 51

VBS regex extract multiple blocks

I want to extract multiple blocks of text using regex. My regex gets the correct start but also returns everything to the end of my file.

I am using:

re.ignorecase = true
re.multiline = false
re.global = true
re.pattern = "\balias\s=\sX[\s\S]{1,}end"

An example of the file format is:

Metadata Begin
    Easting Begin
        alias = X
        projection = "geodetic"
        datum = "GDA94"
    Easting End
    Northing Begin
        alias = Y
        projection = "geodetic"
        datum = "GDA94"
    Northing End
Metadata End

I want to extract the text starting at alias up to the next End for each occurrence so I can deal with the details one alias at a time. e.g.

    alias = X
    projection = "geodetic"
    datum = "GDA94"
Easting End

But this does not get the first End after the alias. Instead the [\s\S] is matching everything after that first alias up to the end of the file. But [\s\S] is the only trick I can think of get past the CrLf at the end of each line.

Is there a regex that match upto the first End over multiple lines?

Upvotes: 0

Views: 235

Answers (2)

Tomalak
Tomalak

Reputation: 338228

I would suggest a multi-step approach.

  1. Single out the blocks:

    (Easting|Northing) Begin([\s\S]*?)\1 End
    
  2. Process their contents line by line

    (\S+)\s+=\s+("?)(.*)\2
    

So, when put together, we get

Option Explicit

Dim reBlock, reLine, input
Dim blockType, blockBody, name, value

Set reBlock = New RegExp
Set reLine = New RegExp

input = LoadYourFile()

reBlock.Pattern = "(Easting|Northing) Begin([\s\S]*?)\1 End"
reBlock.Global = True
reBlock.IgnoreCase = True

reLine.Pattern = "(\S+)\s+=\s+(""?)(.*)\2"
reLine.Global = True
reLine.IgnoreCase = True

For Each block In reBlock.Execute(input)
    blockType = block.SubMatches(0)
    blockBody = block.SubMatches(1)
    For Each line In reLine.Execute(blockBody)
        name = line.SubMatches(0)
        value = line.SubMatches(2)
        WScript.Echo blockType & ": " & name & " = " & value
    Next
Next

Notable features

  • non-greedy matching, as explained in @AvinashRaj's answer.
  • back-references within the regular expression
  • structured approach allows easy output of context information (i.e. "Which block does this value belong to?")

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174706

You need a non-greedy regex. [\s\S]{1,} is greedy which matches all the characters as much as possible. To make this pattern to stop once it finds the match match, you need to add a non-greedy quantifier ? next to {1,}. So it would be like [\s\S]{1,}?. This could written even in more simpler form as [\s\S]+?.

re.pattern = "\balias\s=\sX[\s\S]+?end"

Add \b before and after to the string end if necessary.

DEMO

Upvotes: 3

Related Questions