MsGISRocker
MsGISRocker

Reputation: 642

VBA regex: extract multiple strings between strings within Excel cell with custom function

Within an Excel column I have data such as:

"Audi (ADI), Mercedes (modelx) (MEX), Ferrari super fast, high PS (FEH)"

There hundreds of models that are described by a name and an abbreviation of three capitalized letters in brackets.

I need to extract the names only and the abbreviations only to separate cells. I succeeded doing this for the abbreviations by the following module:

Function extrABR(cellRef) As String
    Dim RE As Object, MC As Object, M As Object
    Dim sTemp As Variant
    Const sPat As String = "([A-Z][A-Z][A-Z][A-Z]?)"  ' this is my regex to match my string
    
    
Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .MultiLine = True
    .Pattern = sPat
    If .Test(cellRef) Then
        Set MC = .Execute(cellRef)
        For Each M In MC
            sTemp = sTemp & ", " & M.SubMatches(0)
        Next M
    End If
End With

extrABR = Mid(sTemp, 3)

End Function 

However, I do not manage to do so for names. I thought of just exchanging the regex by the following regex: (^(.*?)(?= \([A-Z][A-Z][A-Z])|(?<=, )(.*)(?= \([A-Z][A-Z][A-Z])), but VBA does not seem to allow lookbehind.

Any idea?

Upvotes: 1

Views: 1351

Answers (2)

Naresh
Naresh

Reputation: 3034

RE.REPLACE -- Try this function.. anything between the parenthesis will be replaced with "" giving you string of model names only, which you can then split on comma and get string array if so desired.

Function ModelNames(cellRef) As String
    Dim RE As Object, MC As Object, M As Object
    Dim sTemp As Variant, sPat As String
    sPat = "\([^)]+\)"
'Or you can use your formula pattern "([A-Z][A-Z][A-Z][A-Z]?)" to get (modelx)  in the final output.

Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .MultiLine = True
    .Pattern = sPat
End With

ModelNames = RE.Replace(cellRef, "")

End Function

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

Correct, lookbehinds are not supported, but they are only necessary when your expected matches overlap. It is not the case here, all your matches are non-overlapping. So, you can again rely on capturing:

(?:^|,)\s*(.*?)(?=\s*\([A-Z]{3,}\))

See the regex demo. Group 1 values are accessed via .Submatches(0).

Details:

  • (?:^|,) - either start of a string or a comma
  • \s* - zero or more whitespace chars
  • (.*?) - Capturing group 1: any zero or more chars other than line break chars as few as possible
  • (?=\s*\([A-Z]{3,}\)) - a positive lookahead that matches a location that is immediately followed with
    • \s* - zero or more whitespace chars
    • \( - a ( char
    • [A-Z]{3,} - three or more uppercase chars
    • \) - a ) char.

Demo screenshot:

enter image description here

Upvotes: 1

Related Questions