MI285
MI285

Reputation: 21

Excel VBA nested regex replace

Is it possible to have nested regular-expression replace functions in Excel VBA and/or as a cell formula?

for example I have the following text: "Lorem [ipsum dolor sit] amet, [consetetur sadipscing] elitr." (note the square brackets)

is there any possibility to transform it to: "Lorem ipsum_dolor_sit amet, consetetur_sadipscing elitr."?

so i want to:

  1. find all the terms inside square brackets
  2. inside them: replace all blanks with an underscore
  3. remove the square brackets

and get returned the whole sentence with these replacements.

with my current attemp, I can only remove the brackets:

Function RegexReplace(...)
...
Set oRegex = CreateObject("VBScript.RegExp")
oRegex.Pattern = "\[([^\[\]]*)\]"
oRegex.Replace(strInput, "$1")
...

and nesting this function as formula in a Cell or inside the Code oRegex.Replace(strInput, Replace("$1", " ", "_") seems not to be possible, because the nested replace-function Replace(.. is called before evaluating the replacement string $1, so there is no blank to replace.

any proposals for solution? Thanks :-)

Upvotes: 1

Views: 971

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626950

You may replace all spaces with underscores in your matches if you modify the code like

Function RegexReplace(cell As Variant) As String
Dim oRegex As Object, m As Object, offset As Long
Set oRegex = CreateObject("VBScript.RegExp")
offset = 0
With oRegex
    .Pattern = "\[([^[\]]*)]"
    .Global = True
End With
RegexReplace = cell.Text
For Each m In oRegex.Execute(RegexReplace)
    RegexReplace = Left(RegexReplace, m.FirstIndex - offset) & Replace(m.SubMatches(0), " ", "_") & Mid(RegexReplace, m.FirstIndex + 1 + Len(m.Value) - offset)
    offset = offset + 2
Next m
End Function

The \[([^[\]]*)] pattern will match [, then captures into Group 1 zero or more chars other than [ and ] and then just matches a ]. Then, you iterate through all matches with For Each m In oRegex.Execute(RegexReplace) and once a match is found, a substring before the match, the match with replace spaces (using Replace(m.Submatches(0), " ", "_")) and the text after match are concatenated to form the function output.

Upvotes: 0

MI285
MI285

Reputation: 21

another solution would be to execute the regex with every loop und take only the first match and update the result for that. so the character indexes are updated every loop and you can use any replacement that changes the count of characters. here i also use the submatch, that is without the brackets (see Pattern).

Function RegexReplace(cell As Variant) As String
Dim oRegex As Object, m As Object
Dim strMatch As String

Set oRegex = CreateObject("VBScript.RegExp")
With oRegex
    .Pattern = "\[([^\[\]]*)\]"
    .Global = False
End With
RegexReplace = cell.Text
Do While oRegex.Test(RegexReplace)
    Set allMatches = oRegex.Execute(RegexReplace)
    Set FirstMatch = allMatches(0)
    strMatch = Replace(FirstMatch.SubMatches(0), " ", "_")
    RegexReplace = Left(RegexReplace, FirstMatch.FirstIndex) & strMatch & Mid(RegexReplace, FirstMatch.FirstIndex + Len(FirstMatch.Value) + 1)
    DoEvents
Loop

End Function

Upvotes: 1

Related Questions