jsheeran
jsheeran

Reputation: 3037

Excluding line breaks from regex capture

I realise that a similar question has been asked before and answered, but the problem persists after I've tried the solution proposed in that answer.

I want to write an Excel macro to separate a multi-line string into multiple single lines, trimmed of whitespace including line breaks. This is my code:

Sub testRegexMatch()
    Dim r As New VBScript_RegExp_55.regexp
    Dim str As String
    Dim mc As MatchCollection
    r.Pattern = "[\r\n\s]*([^\r\n]+?)[\s\r\n]*$"
    r.Global = True
    r.MultiLine = True
    str = "This is a haiku" & vbCrLf _
        & "You may read it if you wish   " & vbCrLf _
        & "   but you don't have to"
    Set mc = r.Execute(str)
    For Each Line In mc
      Debug.Print "^" & Line & "$"
    Next Line
End Sub

Expected output:

^This is a haiku$
^You may read it if you wish$
^but you don't have to$

Actual output:

^This is a haiku
$
^
You may read it if you wish   
$
^
   but you don't have to$

I've tried the same thing on Regex101, but this appears to show the correct captures, so it must be a quirk of VBA's regex engine.

Any ideas?

Upvotes: 1

Views: 496

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627600

You just need to access the captured values via SubMatches():

When a regular expression is executed, zero or more submatches can result when subexpressions are enclosed in capturing parentheses. Each item in the SubMatches collection is the string found and captured by the regular expression.

Here is my demo:

Sub DemoFn()
   Dim re, targetString, colMatch, objMatch
   Set re = New regexp
   With re
     .pattern = "\s*([^\r\n]+?)\s*$"
     .Global = True              ' Same as /g at the online tester
     .MultiLine = True           ' Same as /m at regex101.com
   End With
   targetString = "This is a haiku  " & vbLf & "  You may read it if you wish " & vbLf & "    but you don't have to"
   Set colMatch = re.Execute(targetString)
   For Each objMatch In colMatch
     Debug.Print objMatch.SubMatches.Item(0) ' <== SEE HERE
   Next
End Sub

It prints:

This is a haiku
You may read it if you wish
but you don't have to

Upvotes: 1

Related Questions