totesbread
totesbread

Reputation: 13

Extract Regex Value in Excel Visual basic

I have a Visual Basic code ready but it extracts everything but the matching regex.

The sheet looks like this

Here's the code I use:

Function simpleCellRegex(Myrange As Range) As String
Dim regEx As New RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As String

strPattern = "[a-zA-Z]{4}\d{4}$"

If strPattern <> "" Then
    strInput = Myrange.Value
    strReplace = ""
    
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .pattern = strPattern
    End With
    
    If regEx.test(strInput) Then
        simpleCellRegex = regEx.Replace(strInput, strReplace)
    Else
        simpleCellRegex = "Not matched"
    End If
End If
End Function

I use Excel 2019 on Windows.

Upvotes: 0

Views: 202

Answers (1)

JvdV
JvdV

Reputation: 75990

VBA:

You could still do this via regex replace adjusting your pattern, but rather pull the value directly from the matchobject, for example:

Function simpleCellRegex(s As String) As String

With CreateObject("vbscript.regexp")
    .Pattern = "(?:^|\s)([a-zA-Z]{4}\d{4})$"
    If .Test(s) Then
        simpleCellRegex = .Execute(s)(0).submatches(0)
    Else
        simpleCellRegex = "No Match"
    End If
End With

End Function

You'll notice I edited the pattern to (?:^|\s)([a-zA-Z]{4}\d{4})$. This would ensure that the previous character is either non-existing or is a whitespace. This should prevent possible false positives when string is part of a larger substring. Even word-boundaries won't help in case of 'test-abcd1000'. See an online demo. This does, however, require us to acces the submatches from the matchobject. Both indexes are zero-based. Another option is to use Trim(.Execute(s)(0)) to get rid of the potential whitespace.


FILTERXML():

Though regex is a viable option, it does require your workbook to be macro-enabled. You could opt for pattern validation with native function, for example FILTERXML():

=IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(A2;" ";"</s><s>")&"</s></t>";"//s[last()][translate(substring(.,1,4), 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ','')=''][string-length()=8][substring(.,5)*0=0]");"No Match")

The meaning of the xpath used:

  • //s[last()] - Get last element in array. In regex term; validate we are looking at the text before the end-line anchor;
  • [translate(substring(.,1,4), 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ','')=''] - When first four characters are translated (alpha to empty string) check if remainder equals empty;
  • [string-length()=8] - Check if lenght of string is 8 characters;
  • [substring(.,5)*0=0] - Check that substring from 5th characters onwards equals zero when multiplied by zero.

enter image description here

Upvotes: 1

Related Questions