Reputation: 151
After creating a website called http://vbfiddle.net to implement and test @ctwheel's VBScript solution in a browser (MS IE 10+ with Security set to "Medium", instructions on that website for how to set it up for you to play with should you want -- get the code to copy & paste into vbfiddle.net from this link at jsfiddle.net: https://jsfiddle.net/hcwjhmg9/ [vbfiddle.net does not currently have a "save" feature] ), I found that @ctwheel's VBScript RegEx ran successfully, even for the 3rd example line I gave, but when @ctwheel's VBScript RegEx is used in VBScript for VBA for Microsoft Access 2016 against a record read from a database with the "same" value, the third subgroup only returns "Ray," for the 3rd example line I gave, when it should return "Ray, CFP" like it does in vbfiddle.net.
It finally occurred to me to iterate through every character of the string value returned by the database (in VBA in Microsoft Access), and compare it to an iteration of every character of the visually-equivalent string value I type directly into the code (in VBA in Microsoft Access). I get the following results:
First Name and Last Name: "G.L. (Pete) Ray, CFP"
--- 1st Text chars: "71 46 76 46 32 40 80 101 116 101 41 32 82 97 121 44"
(Read value from database, appears same as below when Debug.Print is called on it)
--- 2nd Text chars: "71 46 76 46 32 40 80 101 116 101 41 32 82 97 121 44 32 67 70 80" (Typed by keyboard into a string within the code)
'G.L. (Pete) Ray,'
strProperName>objSubMatch: "G.L."
strProperName>objSubMatch: "Pete"
strProperName>objSubMatch: "Ray,"
Matching record(s): 3 of 1132 record(s).
The RegEx I'm running is running against the "1st Text Chars
" example, and returns "Ray," for the 3rd subgroup of the previously given 3rd example line: "G.L. (Pete) Ray, CFP
". However, if I run the RegEx against the 2nd -- typed directly into code -- "2nd Text chars
" example, the 3rd subgroup returns "Ray, CFP" as expected in VBA for Microsoft Access 2016.
I'm now using the RegEx that @ctwheels provided:
^([^(]+?)\s*\(\s*([^)]*?)\s*\)\s*(.*)
Can someone explain what's going on here? 1) Why are the characters returned from the database different from the characters returned from typing the string using a keyboard by reading and copying it visually? 2) How do I make a RegEx that works on the "1st Text Chars
" sequence of characters / string return the correct 3rd subgroup: "Ray, CFP
" when the value is read directly from the database?
ORIGINAL QUESTION (updated question above):
I'm having problems in VBA using Microsoft Access 2016 with Regex Engine I believe 5.5 for VBScript.
This is the regex expression I'm currently using:
"(.*)\((.*)(\))(.*)"
I'm trying to parse the strings (respectively on each new line):
Lawrence N. (Larry) Solomon
James ( Jim ) Alleman
G.L. (Pete) Ray, CFP
Into:
"Lawrence N.", "Larry", ")", "Solomon"
"James", "Jim", ")", "Alleman"
"G.L.", "Pete", ")", "Ray, CFP"
Or alternatively (and preferably) into:
"Lawrence N.", "Larry", "Solomon"
"James", "Jim", "Alleman"
"G.L.", "Pete", "Ray, CFP"
where the parts within the quotes, separated by commas, are those returned in the submatches (without quotes)
I am using the following code:
' For Proper Name (strProperName):
With objRegex
.Global = False
.MultiLine = False
.IgnoreCase = True
.Pattern = "(.*)\((.*)(\))(.*)"
'([\s|\S]*) work around to match every character?
'".*\(([^\s]*)[\s]*\(" '_
''& "[\"
'[\(][\s]*([.|^\w]*)[\s]*\)"
' "[\s]*(.*)[\s]*\("
' does same as below except matches any or no whitespace preceding any characters,
' and returns the following characters up to an opening parenthesis ("(") but excluding it,
' as the first subgroup
' "(.*)[\s]*\("
' does same as below except matches any whitespace or no whitespace at all followed by an opening parenthesis ("(")
' and returns the preceding characters as the first subgroup
' "(.*)\("
' matches all characters in a row that end with an open parenthesis, and returns all of these characters in a row
' excluding the following open parenthesis as the first subgroup
' "(.*?\(\s)"
' "[^\(]*"
' this pattern returns every character that isn't an opening parenthesis ("("), and when
' it matches an open parenthesis, it does not return it or any characters after it
' "[\(\s](.*)\)"
' this pattern extracts everything between parenthesis in a line as its first submatch
' "(?<=\().*"
' "[^[^\(]*][.*]"
' "(\(.*?\))"
' "(\(.*?\))*([^\(].*[^\)])"
End With
If objRegex.Test(strFirstNameTrimmed) Then
'Set strsMatches = objRegex.Execute(rs.Fields("First Name"))
Set strsMatches = objRegex.Execute(strFirstNameTrimmed)
Debug.Print "2:'" & strsMatches(0).Value & "'"
If strsMatches(0).SubMatches.Count > 0 Then
For Each objSubMatch In strsMatches(0).SubMatches
Debug.Print " strProperName>objSubMatch: """ & objSubMatch & """" 'Result: 000, 643, 888"
strProperName = objSubMatch
Next objSubMatch
End If
Else
strProperName = "*Not Matched*"
End If
Produces the following output in the debug window / "Immediate Window" as it's known in VBA, brought up by (Ctrl+G):
------------------------
First Name and Last Name: "Lawrence N. (Larry) Solomon"
2:'Lawrence N. (Larry)'
strProperName>objSubMatch: "Lawrence N. "
strProperName>objSubMatch: "Larry"
strProperName>objSubMatch: ")"
strProperName>objSubMatch: ""
Extracted Nick Name: "Larry"
Extracted Proper Name: ""
First Name and Last Name: "James ( Jim ) Alleman"
2:'James ( Jim )'
strProperName>objSubMatch: "James "
strProperName>objSubMatch: " Jim "
strProperName>objSubMatch: ")"
strProperName>objSubMatch: ""
Extracted Nick Name: "Jim"
Extracted Proper Name: ""
First Name and Last Name: "G.L. (Pete) Ray, CFP"
2:'G.L. (Pete) Ray,'
strProperName>objSubMatch: "G.L. "
strProperName>objSubMatch: "Pete"
strProperName>objSubMatch: ")"
strProperName>objSubMatch: " Ray,"
Extracted Nick Name: "Pete"
Extracted Proper Name: " Ray,"
Matching record(s): 3 of 1132 record(s).
Upvotes: 2
Views: 491
Reputation: 5677
You should be able to avoid using Regex, if that's your thing.
I made some assumptions about the test data that the nickname is contained within "()". Other than that the code should be straightforward, I hope. If not, feel free to ask a question. There is a Test routine called Test
included too.
Public Function ParseString(InputString As String) As String
On Error GoTo ErrorHandler:
Dim OutputArray As Variant
Const DoubleQuote As String = """"
'Quick exit, if () aren't found, then just return original text
If InStr(1, InputString, "(") = 0 Or InStr(1, InputString, ")") = 0 Then
ParseString = InputString
Exit Function
End If
'Replace the ) with (, then do a split
OutputArray = Split(Replace(InputString, ")", "("), "(")
'Check the array bounds and output accordingly
'If there can only ever be 3 (0 - 2) elements, then you can change this if statement
If UBound(OutputArray) = 2 Then
ParseString = DoubleQuote & Trim$(OutputArray(0)) & DoubleQuote & ", " & _
DoubleQuote & Trim$(OutputArray(1)) & DoubleQuote & ", " & _
DoubleQuote & Trim$(OutputArray(2)) & DoubleQuote
ElseIf UBound(OutputArray) = 1 Then
ParseString = DoubleQuote & Trim$(OutputArray(0)) & DoubleQuote & ", " & _
DoubleQuote & Trim$(OutputArray(1)) & DoubleQuote
Else
ParseString = DoubleQuote & Trim$(OutputArray(LBound(OutputArray))) & DoubleQuote
End If
CleanExit:
Exit Function
ErrorHandler:
ParseString = InputString
Resume CleanExit
End Function
Sub Test()
Dim Arr() As Variant: Arr = Array("Lawrence N. (Larry) Solomon", "James ( Jim ) Alleman", "G.L. (Pete) Ray, CFP")
For i = LBound(Arr) To UBound(Arr)
Debug.Print ParseString(CStr(Arr(i)))
Next
End Sub
Results
"Lawrence N.", "Larry", "Solomon"
"James", "Jim", "Alleman"
"G.L.", "Pete", "Ray, CFP"
Upvotes: 1
Reputation: 3405
Regex: \s*[()]\s*
Details:
\s*
matches any whitespace character zero and unlimited times[()]
Match a single character present in the list (
or )
VBA code:
Dim str As String
str = "Lawrence N. (Larry) Solomon"
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.Pattern = "\s*[()]\s*"
re.MultiLine = True
Dim arr As Variant
arr = Strings.Split(re.Replace(str, vbNullChar), vbNullChar)
For Each Match In arr
Debug.Print (Match)
Next
Output:
Lawrence N.
Larry
Solomon
Upvotes: 0
Reputation: 22837
^([^(]+?)\s*\(\s*([^)]*?)\s*\)\s*(.*)
^
Assert position at the start of the line([^(]+?)
Capture any character except (
one or more times, but as few as possible, into capture group 1\s*
Match any number of whitespace characters\(
Match (
literally\s*
Match any number of whitespace characters([^)]*?)
Capture any character except )
one or more times, but as few as possible, into capture group 2\s*
Match any number of whitespace characters\(
Match (
literally\s*
Match any number of whitespace characters(.*)
Capture the rest of the line into capture group 3Results in:
["Lawrence N.", "Larry", "Solomon"]
["James", "Jim", "Alleman"]
["G.L.", "Pete", "Ray, CFP"]
Upvotes: 1