blueuser
blueuser

Reputation: 151

Specific VBA / VBScript regex related issue (MS Access)


UPDATE: March 12, 2018 1:44pm CST

After creating a website called http://vbfiddle.net to implement and test @ctwheel's VBScript solution in a browser (MS IE 10+ with Security set to "Medium", instructions on that website for how to set it up for you to play with should you want -- get the code to copy & paste into vbfiddle.net from this link at jsfiddle.net: https://jsfiddle.net/hcwjhmg9/ [vbfiddle.net does not currently have a "save" feature] ), I found that @ctwheel's VBScript RegEx ran successfully, even for the 3rd example line I gave, but when @ctwheel's VBScript RegEx is used in VBScript for VBA for Microsoft Access 2016 against a record read from a database with the "same" value, the third subgroup only returns "Ray," for the 3rd example line I gave, when it should return "Ray, CFP" like it does in vbfiddle.net.

It finally occurred to me to iterate through every character of the string value returned by the database (in VBA in Microsoft Access), and compare it to an iteration of every character of the visually-equivalent string value I type directly into the code (in VBA in Microsoft Access). I get the following results:

First Name and Last Name: "G.L. (Pete) Ray, CFP"
--- 1st Text chars: "71 46 76 46 32 40 80 101 116 101 41 32 82 97 121 44" 
(Read value from database, appears same as below when Debug.Print is called on it)
--- 2nd Text chars: "71 46 76 46 32 40 80 101 116 101 41 32 82 97 121 44 32 67 70 80" (Typed by keyboard into a string within the code)
'G.L. (Pete) Ray,'
    strProperName>objSubMatch: "G.L."
    strProperName>objSubMatch: "Pete"
    strProperName>objSubMatch: "Ray,"
 Matching record(s): 3 of 1132 record(s).

The RegEx I'm running is running against the "1st Text Chars" example, and returns "Ray," for the 3rd subgroup of the previously given 3rd example line: "G.L. (Pete) Ray, CFP". However, if I run the RegEx against the 2nd -- typed directly into code -- "2nd Text chars" example, the 3rd subgroup returns "Ray, CFP" as expected in VBA for Microsoft Access 2016.

I'm now using the RegEx that @ctwheels provided:

^([^(]+?)\s*\(\s*([^)]*?)\s*\)\s*(.*)

Can someone explain what's going on here? 1) Why are the characters returned from the database different from the characters returned from typing the string using a keyboard by reading and copying it visually? 2) How do I make a RegEx that works on the "1st Text Chars" sequence of characters / string return the correct 3rd subgroup: "Ray, CFP" when the value is read directly from the database?


ORIGINAL QUESTION (updated question above):

I'm having problems in VBA using Microsoft Access 2016 with Regex Engine I believe 5.5 for VBScript.

This is the regex expression I'm currently using:

"(.*)\((.*)(\))(.*)"

I'm trying to parse the strings (respectively on each new line):

Lawrence N. (Larry) Solomon
James ( Jim ) Alleman
G.L. (Pete) Ray, CFP

Into:

"Lawrence N.", "Larry", ")", "Solomon"
"James", "Jim", ")", "Alleman"
"G.L.", "Pete", ")", "Ray, CFP"

Or alternatively (and preferably) into:

"Lawrence N.", "Larry", "Solomon"
"James", "Jim", "Alleman"
"G.L.", "Pete", "Ray, CFP"

where the parts within the quotes, separated by commas, are those returned in the submatches (without quotes)

I am using the following code:

           ' For Proper Name (strProperName):
            With objRegex
                .Global = False
                .MultiLine = False
                .IgnoreCase = True
                .Pattern = "(.*)\((.*)(\))(.*)"

                    '([\s|\S]*) work around to match every character?

                        '".*\(([^\s]*)[\s]*\(" '_
                        ''& "[\"
                        '[\(][\s]*([.|^\w]*)[\s]*\)"
                    ' "[\s]*(.*)[\s]*\("
                        ' does same as below except matches any or no whitespace preceding any characters,
                        ' and returns the following characters up to an opening parenthesis ("(") but excluding it,
                        ' as the first subgroup
                    ' "(.*)[\s]*\("
                        ' does same as below except matches any whitespace or no whitespace at all followed by an opening parenthesis ("(")
                        ' and returns the preceding characters as the first subgroup
                    ' "(.*)\("
                        ' matches all characters in a row that end with an open parenthesis, and returns all of these characters in a row
                        ' excluding the following open parenthesis as the first subgroup
                    ' "(.*?\(\s)"
                    ' "[^\(]*"
                        ' this pattern returns every character that isn't an opening parenthesis ("("), and when
                        ' it matches an open parenthesis, it does not return it or any characters after it
                    ' "[\(\s](.*)\)"
                        ' this pattern extracts everything between parenthesis in a line as its first submatch
                    ' "(?<=\().*"
                    ' "[^[^\(]*][.*]"
                    ' "(\(.*?\))"
                    ' "(\(.*?\))*([^\(].*[^\)])"
            End With

            If objRegex.Test(strFirstNameTrimmed) Then
                'Set strsMatches = objRegex.Execute(rs.Fields("First Name"))
                Set strsMatches = objRegex.Execute(strFirstNameTrimmed)



                Debug.Print "2:'" & strsMatches(0).Value & "'"

                If strsMatches(0).SubMatches.Count > 0 Then

                    For Each objSubMatch In strsMatches(0).SubMatches

                        Debug.Print "    strProperName>objSubMatch: """ & objSubMatch & """" 'Result: 000, 643, 888"

                        strProperName = objSubMatch

                    Next objSubMatch

                End If
            Else
                strProperName = "*Not Matched*"
            End If

Produces the following output in the debug window / "Immediate Window" as it's known in VBA, brought up by (Ctrl+G):

------------------------
First Name and Last Name: "Lawrence N. (Larry) Solomon"
2:'Lawrence N. (Larry)'
    strProperName>objSubMatch: "Lawrence N. "
    strProperName>objSubMatch: "Larry"
    strProperName>objSubMatch: ")"
    strProperName>objSubMatch: ""
Extracted Nick Name: "Larry"
Extracted Proper Name: ""
First Name and Last Name: "James ( Jim ) Alleman"
2:'James ( Jim )'
    strProperName>objSubMatch: "James "
    strProperName>objSubMatch: " Jim "
    strProperName>objSubMatch: ")"
    strProperName>objSubMatch: ""
Extracted Nick Name: "Jim"
Extracted Proper Name: ""
First Name and Last Name: "G.L. (Pete) Ray, CFP"
2:'G.L. (Pete) Ray,'
    strProperName>objSubMatch: "G.L. "
    strProperName>objSubMatch: "Pete"
    strProperName>objSubMatch: ")"
    strProperName>objSubMatch: " Ray,"
Extracted Nick Name: "Pete"
Extracted Proper Name: " Ray,"
Matching record(s): 3 of 1132 record(s).

Upvotes: 2

Views: 491

Answers (3)

Ryan Wildry
Ryan Wildry

Reputation: 5677

You should be able to avoid using Regex, if that's your thing.

I made some assumptions about the test data that the nickname is contained within "()". Other than that the code should be straightforward, I hope. If not, feel free to ask a question. There is a Test routine called Test included too.

Public Function ParseString(InputString As String) As String
    On Error GoTo ErrorHandler:

    Dim OutputArray   As Variant
    Const DoubleQuote As String = """"

    'Quick exit, if () aren't found, then just return original text
    If InStr(1, InputString, "(") = 0 Or InStr(1, InputString, ")") = 0 Then
        ParseString = InputString
        Exit Function
    End If

    'Replace the ) with (, then do a split
    OutputArray = Split(Replace(InputString, ")", "("), "(")

    'Check the array bounds and output accordingly
    'If there can only ever be 3 (0 - 2) elements, then you can change this if statement
    If UBound(OutputArray) = 2 Then
        ParseString = DoubleQuote & Trim$(OutputArray(0)) & DoubleQuote & ", " & _
                      DoubleQuote & Trim$(OutputArray(1)) & DoubleQuote & ", " & _
                      DoubleQuote & Trim$(OutputArray(2)) & DoubleQuote
    ElseIf UBound(OutputArray) = 1 Then
        ParseString = DoubleQuote & Trim$(OutputArray(0)) & DoubleQuote & ", " & _
                      DoubleQuote & Trim$(OutputArray(1)) & DoubleQuote
    Else
        ParseString = DoubleQuote & Trim$(OutputArray(LBound(OutputArray))) & DoubleQuote
    End If

CleanExit:
    Exit Function

ErrorHandler:
    ParseString = InputString
    Resume CleanExit
End Function

Sub Test()
    Dim Arr() As Variant: Arr = Array("Lawrence N. (Larry) Solomon", "James ( Jim ) Alleman", "G.L. (Pete) Ray, CFP")

    For i = LBound(Arr) To UBound(Arr)
        Debug.Print ParseString(CStr(Arr(i)))
    Next
End Sub

Results

"Lawrence N.", "Larry", "Solomon"
"James", "Jim", "Alleman"
"G.L.", "Pete", "Ray, CFP"

Upvotes: 1

Srdjan M.
Srdjan M.

Reputation: 3405

Regex: \s*[()]\s*

Details:

  • \s* matches any whitespace character zero and unlimited times
  • [()] Match a single character present in the list ( or )

VBA code:

Dim str As String
str = "Lawrence N. (Larry) Solomon"

Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\s*[()]\s*"
re.MultiLine = True

Dim arr As Variant
arr = Strings.Split(re.Replace(str, vbNullChar), vbNullChar)

For Each Match In arr
    Debug.Print (Match)
Next

Output:

Lawrence N.
Larry
Solomon

Upvotes: 0

ctwheels
ctwheels

Reputation: 22837

See regex in use here

^([^(]+?)\s*\(\s*([^)]*?)\s*\)\s*(.*)
  • ^ Assert position at the start of the line
  • ([^(]+?) Capture any character except ( one or more times, but as few as possible, into capture group 1
  • \s* Match any number of whitespace characters
  • \( Match ( literally
  • \s* Match any number of whitespace characters
  • ([^)]*?) Capture any character except ) one or more times, but as few as possible, into capture group 2
  • \s* Match any number of whitespace characters
  • \( Match ( literally
  • \s* Match any number of whitespace characters
  • (.*) Capture the rest of the line into capture group 3

Results in:

["Lawrence N.", "Larry", "Solomon"]
["James", "Jim", "Alleman"]
["G.L.", "Pete", "Ray, CFP"]

Upvotes: 1

Related Questions