Reputation: 51
I want to extract multiple blocks of text using regex. My regex gets the correct start but also returns everything to the end of my file.
I am using:
re.ignorecase = true
re.multiline = false
re.global = true
re.pattern = "\balias\s=\sX[\s\S]{1,}end"
An example of the file format is:
Metadata Begin
Easting Begin
alias = X
projection = "geodetic"
datum = "GDA94"
Easting End
Northing Begin
alias = Y
projection = "geodetic"
datum = "GDA94"
Northing End
Metadata End
I want to extract the text starting at alias
up to the next End
for each occurrence so I can deal with the details one alias at a time. e.g.
alias = X
projection = "geodetic"
datum = "GDA94"
Easting End
But this does not get the first End
after the alias
. Instead the [\s\S]
is matching everything after that first alias
up to the end of the file. But [\s\S]
is the only trick I can think of get past the CrLf at the end of each line.
Is there a regex that match upto the first End
over multiple lines?
Upvotes: 0
Views: 235
Reputation: 338228
I would suggest a multi-step approach.
Single out the blocks:
(Easting|Northing) Begin([\s\S]*?)\1 End
Process their contents line by line
(\S+)\s+=\s+("?)(.*)\2
So, when put together, we get
Option Explicit
Dim reBlock, reLine, input
Dim blockType, blockBody, name, value
Set reBlock = New RegExp
Set reLine = New RegExp
input = LoadYourFile()
reBlock.Pattern = "(Easting|Northing) Begin([\s\S]*?)\1 End"
reBlock.Global = True
reBlock.IgnoreCase = True
reLine.Pattern = "(\S+)\s+=\s+(""?)(.*)\2"
reLine.Global = True
reLine.IgnoreCase = True
For Each block In reBlock.Execute(input)
blockType = block.SubMatches(0)
blockBody = block.SubMatches(1)
For Each line In reLine.Execute(blockBody)
name = line.SubMatches(0)
value = line.SubMatches(2)
WScript.Echo blockType & ": " & name & " = " & value
Next
Next
Notable features
Upvotes: 1
Reputation: 174706
You need a non-greedy regex. [\s\S]{1,}
is greedy which matches all the characters as much as possible. To make this pattern to stop once it finds the match match, you need to add a non-greedy quantifier ?
next to {1,}
. So it would be like [\s\S]{1,}?
. This could written even in more simpler form as [\s\S]+?
.
re.pattern = "\balias\s=\sX[\s\S]+?end"
Add \b
before and after to the string end
if necessary.
Upvotes: 3