ChaosFreak
ChaosFreak

Reputation: 593

RegEx in Excel VBA Matching Extended ASCII Chars Improperly

I'm trying to remove all non-printable and non-ASCII (extended) characters using the following RegEx in Excel VBA:

[^\x09\0A\0D\x20-\xFF]

This should theoretically match anything that's not a tab, linefeed, carriage return or printable ASCII character (character code between hex 20 and FF or dec 32 and 255). I have confirmed here that Microsoft VBScript regular expressions support the \xCC notation where CC is an ASCII code in hexadecimal.

The problem is that this regex is matching every character above 127. It's then throwing an "invalid procedure call" on match.value when the matching character's code is above 127. Is it just that VBScript RegExes don't support character codes above 127? I can't seem to find this data anywhere. Here's the full code:

regEx.Pattern = "[^\x09\0A\0D\x20-\xFF]"
regEx.IgnoreCase = True 'True to ignore case
regEx.Global = True 'True matches all occurances, False matches the first occurance
regEx.MultiLine = True
If regEx.Test(Cells(curRow, curCol).Value) Then
    Set matches = regEx.Execute(Cells(curRow, curCol).Value)
    numReplacements = numReplacements + matches.Count
    For matchNum = matches.Count To 1 Step -1
        Cells(numReplacements - matchNum + 2, 16).Value = matches.Item(matchNum).Value
        Cells(numReplacements - matchNum + 2, 17).Value = Asc(matches.Item(matchNum).Value)
    Next matchNum
    Cells(curRow, curCol).Value = regEx.Replace(Cells(curRow, curCol).Value, replacements(pattNo))
End If

The first character it matches is 0x96 (&ndash). I can see it in the "Watches" window when I watch "matches" and expand it. However, when I try to watch matches.Item(matchNum).Value I get (see screenshot). Any ideas?

Upvotes: 4

Views: 2485

Answers (1)

JosefZ
JosefZ

Reputation: 30153

Microsoft VBScript regular expressions support the \xCC notation where CC is an ASCII code in hexadecimal

Note that ASCII is defined from \x00 to \x7F, where printable ASCII characters are from \x20 to \x7E.

Codes \x80 and above are Ansi, not ASCII.

Try next:

Dim ii, sExPatern: sExPatern = "[^\x09\x0A\x0D\x20-\x7E\"
For ii = 128 To 255
  sExPatern = sExPatern & Chr( ii)
Next
sExPatern = sExPatern & "]"
'...
regEx.Pattern = sExPatern

Honestly, I'm not sure on pritability of some codes, e.g. 129, 131, 136, 144, 152, 160 in decimal (my Ansi code page is "Windows Central Europe", so you may consider more detailed examination)

Upvotes: 1

Related Questions