jamheadart
jamheadart

Reputation: 5333

RegEx array / list / collection of all matches in VBA

I'm trying to use RegEx to get all instances of varying strings that exist in between a particular pair set of strings. E.g. in the following string:

"The Start. Hello. Jamie. Bye. The Middle. Hello. Sarah. Bye. The End"

I want to get a collection / array consisting of "Jamie" and "Sarah" by checking in between "Hello. " and ". Bye. "

My RegEx object is working fine and I feel I'm nearly successful:

Sub Reggie()
Dim x As String: x = "The Start. Hello. Jamie. Bye. The Middle. Hello. Sarah. Bye. The End"
Dim regEx As RegExp
Set regEx = New RegExp
Dim rPat1 As String: rPat1 = "Hello. "
Dim rPat2 As String: rPat2 = " Bye."
Dim rPat3 As String: rPat3 = ".*"
With regEx
    .Global = True
    .ignorecase = True
    .Pattern = "(^.*" & rPat1 & ")(" & rPat3 & ")(" & rPat2 & ".*)"
    .MultiLine = True
    ' COMMAND HERE
End With
End Sub

But the last bit COMMAND HERE I'm trying .replace(x, "$2") which gives me a string of the last instance of a match i.e. Sarah

I've tried .Execute(x) which gives me a MatchCollection object and when browsing the immediate window I see that object only has the last instance of a match.

Is what I'm requiring possible and how?

Upvotes: 2

Views: 616

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627146

That is because .* matches as many any chars as possible and you should not match the whole string by adding .* on both ends of your regular expression.

Besides, you need to escape special chars in the regex pattern, here, . is special as it matches any char other than a line break char.

You need to fix your regex declaration like

rPat1 = "Hello\. "
rPat2 = " Bye\."
rPat3 = ".*?"`
.Pattern = rPat1 & "(" & rPat3 & ")" & rPat2

Or, to further enhance the regex, you may

  • Replace literal spaces with \s* (zero or more whitespaces) or \s+ (one or more whitespaces) to support any whitespace
  • Match any non-word chars after the captures string with \W+ or \W*.
rPat1 = "Hello\.\s*"
rPat2 = "\W+Bye\."
rPat3 = ".*?"`
.Pattern = rPat1 & "(" & rPat3 & ")" & rPat2

See the regex demo. Details:

  • Hello\. - Hello. string
  • \s* - zero or more whitespaces
  • (.*?) - Group 1: any zero or more chars other than line break chars as few as possible
  • \W+ - one or more chars other than ASCII letters/digits/_
  • Bye\. - Bye. string.

Upvotes: 3

Related Questions