Reputation: 879
Today I have an odd issue. The text seen in excel is not the same as what the program sees.
In my cell I see.
[NOM DU GARANT], une société en commandite constituée et enregistrée en vertu de la loi Canadain (ci-après le « garant »)
but when I load this into a variable in VBA and print it out in my immediate window I get the following.
?sText
[NOM DU GARANT], une société en commandite constituée et enregistrée en vertu de la loi Canadain (???ci?????après le « garant »)
When I pasted the cell here this is what I get.
[NOM DU GARANT], une société en commandite constituée et enregistrée en vertu de la loi Canadain (ciaprès le « garant »)
When I go into the cell to I could delete the "hidden" characters.
My question is how can I find / remove / see these hidden characters?
Also is there a way to mitigate this issue. I assume these issues come from people copying from a French MS Word application and pasting into this Excel cell (I have no concrete evidence to support this).
Upvotes: 1
Views: 9835
Reputation: 60224
The problem is that the VBA Editor does not display most UNICODE characters. Specifically, it's ability to display characters is limited to those on the Windows Code Page 1252 (the 8-byte characters).
It can manipulate higher characters, but they won't be displayed in the immediate window (or message boxes, for that matter).
I'm not sure why you feel you need to remove these characters, but:
You can determine the errant characters by displaying each character one at a time, and converting to both standard character code, and Unicode character codes.
In Excel, you might do:
A1: your_text_string
B2: =MID($A$1,ROWS($1:1),1)
C2: =CODE(B2)
D2: =UNICODE(B2)
Select B2:D2 and fill down as far as needed.
You can do something similar in VBA.
Your text string includes the Unicode ZWJ (zero-width-joiner) character for which the decimal code is 8205. So, although this works perfectly OK on your worksheet, it converts to ?
's in the VBA editor.
If you need to remove them, you can do it on the worksheet with
=SUBSTITUTE(A1,UNICHAR(8205),"")
or, the equivalent VBA code:
replace([a1],chrw(8205),"")
If you are not familiar, suggest internet search for ZWJ, the VBA ChrW
function, and the Excel UNICHAR
function.
If this is not an isolated instance, but you need to have a routine to remove any characters which cannot be displayed by the VBA editor, you can try this:
Option Explicit
Function removeChars(s As String)
Dim i As Long, l As Long, sTemp As String
Dim ch As String
For i = 1 To Len(s)
ch = Mid(s, i, 1)
If AscW(ch) <= 255 Then
sTemp = sTemp & ch
End If
Next i
removeChars = sTemp
End Function
And, to show it works, here are screenshots of a MsgBox
generated from the raw text, and the text produced by the removeChars
UDF above:
I don't know if it would be any faster, but here is another UDF using Regular Expressions which can also do the same thing:
Option Explicit
Function removeChars(s As String)
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "[^\x00-\xFF]+"
.Global = True
removeChars = .Replace(s, "")
End With
End Function
This last UDF can be sped up a bit by using early-binding, and, if you have many strings to process, by doing it within VBA arrays.
Upvotes: 3
Reputation: 3877
You may recognize cells with such text by comparing Len(Range.Text)
and Len(Range.Value2)
.
You may experiment with StrConv(string, vbFromUnicode)
and StrConv(string, vbUnicode)
.
You may clean it with WorksheetFunction.Clean()
(works only on lower ASCII codes).
If clean is not effective, I would try to find out the different character codes e. g. by Debug.Print Asc(Mid(Range.Value2, 1, 1))
and build a more/less difficult Replace
function.
If you want to retain only a predefined set of characters, here's a simple function:
Private Function ReducedCharset(ByRef s As String) As String
Dim i As Long
Dim c As String
Dim Temp As String
Temp = ""
For i = 1 To Len(s)
c = Mid(s, i, 1)
If c Like "[0-9A-Za-z€ÄÖÜßäöü]" Then
Temp = Temp & c
End If
Next i
ReducedCharset = Temp
End Function
Upvotes: 2