Reputation: 642
Within an Excel column I have data such as:
"Audi (ADI), Mercedes (modelx) (MEX), Ferrari super fast, high PS (FEH)"
There hundreds of models that are described by a name and an abbreviation of three capitalized letters in brackets.
I need to extract the names only and the abbreviations only to separate cells. I succeeded doing this for the abbreviations by the following module:
Function extrABR(cellRef) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As Variant
Const sPat As String = "([A-Z][A-Z][A-Z][A-Z]?)" ' this is my regex to match my string
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = sPat
If .Test(cellRef) Then
Set MC = .Execute(cellRef)
For Each M In MC
sTemp = sTemp & ", " & M.SubMatches(0)
Next M
End If
End With
extrABR = Mid(sTemp, 3)
End Function
However, I do not manage to do so for names. I thought of just exchanging the regex by the following regex: (^(.*?)(?= \([A-Z][A-Z][A-Z])|(?<=, )(.*)(?= \([A-Z][A-Z][A-Z]))
, but VBA does not seem to allow lookbehind.
Any idea?
Upvotes: 1
Views: 1351
Reputation: 3034
RE.REPLACE -- Try this function.. anything between the parenthesis will be replaced with "" giving you string of model names only, which you can then split on comma and get string array if so desired.
Function ModelNames(cellRef) As String
Dim RE As Object, MC As Object, M As Object
Dim sTemp As Variant, sPat As String
sPat = "\([^)]+\)"
'Or you can use your formula pattern "([A-Z][A-Z][A-Z][A-Z]?)" to get (modelx) in the final output.
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = sPat
End With
ModelNames = RE.Replace(cellRef, "")
End Function
Upvotes: 0
Reputation: 626738
Correct, lookbehinds are not supported, but they are only necessary when your expected matches overlap. It is not the case here, all your matches are non-overlapping. So, you can again rely on capturing:
(?:^|,)\s*(.*?)(?=\s*\([A-Z]{3,}\))
See the regex demo. Group 1 values are accessed via .Submatches(0)
.
Details:
(?:^|,)
- either start of a string or a comma\s*
- zero or more whitespace chars(.*?)
- Capturing group 1: any zero or more chars other than line break chars as few as possible(?=\s*\([A-Z]{3,}\))
- a positive lookahead that matches a location that is immediately followed with
\s*
- zero or more whitespace chars\(
- a (
char[A-Z]{3,}
- three or more uppercase chars\)
- a )
char.Demo screenshot:
Upvotes: 1