Reputation: 593
I'm trying to remove all non-printable and non-ASCII (extended) characters using the following RegEx in Excel VBA:
[^\x09\0A\0D\x20-\xFF]
This should theoretically match anything that's not a tab, linefeed, carriage return or printable ASCII character (character code between hex 20 and FF or dec 32 and 255). I have confirmed here that Microsoft VBScript regular expressions support the \xCC notation where CC is an ASCII code in hexadecimal.
The problem is that this regex is matching every character above 127. It's then throwing an "invalid procedure call" on match.value when the matching character's code is above 127. Is it just that VBScript RegExes don't support character codes above 127? I can't seem to find this data anywhere. Here's the full code:
regEx.Pattern = "[^\x09\0A\0D\x20-\xFF]"
regEx.IgnoreCase = True 'True to ignore case
regEx.Global = True 'True matches all occurances, False matches the first occurance
regEx.MultiLine = True
If regEx.Test(Cells(curRow, curCol).Value) Then
Set matches = regEx.Execute(Cells(curRow, curCol).Value)
numReplacements = numReplacements + matches.Count
For matchNum = matches.Count To 1 Step -1
Cells(numReplacements - matchNum + 2, 16).Value = matches.Item(matchNum).Value
Cells(numReplacements - matchNum + 2, 17).Value = Asc(matches.Item(matchNum).Value)
Next matchNum
Cells(curRow, curCol).Value = regEx.Replace(Cells(curRow, curCol).Value, replacements(pattNo))
End If
The first character it matches is 0x96 (&ndash). I can see it in the "Watches" window when I watch "matches" and expand it. However, when I try to watch matches.Item(matchNum).Value I get (see screenshot). Any ideas?
Upvotes: 4
Views: 2485
Reputation: 30153
Microsoft VBScript regular expressions support the \xCC notation where CC is an ASCII code in hexadecimal
Note that ASCII is defined from \x00 to \x7F, where printable ASCII characters are from \x20 to \x7E.
Codes \x80 and above are Ansi, not ASCII.
Try next:
Dim ii, sExPatern: sExPatern = "[^\x09\x0A\x0D\x20-\x7E\"
For ii = 128 To 255
sExPatern = sExPatern & Chr( ii)
Next
sExPatern = sExPatern & "]"
'...
regEx.Pattern = sExPatern
Honestly, I'm not sure on pritability of some codes, e.g. 129, 131, 136, 144, 152, 160 in decimal (my Ansi code page is "Windows Central Europe", so you may consider more detailed examination)
Upvotes: 1