Tolure
Tolure

Reputation: 879

How to find hidden characters in excel

Today I have an odd issue. The text seen in excel is not the same as what the program sees.

In my cell I see.

[NOM DU GARANT], une société en commandite constituée et enregistrée en vertu de la loi Canadain (‍‍‍ci‍-après le « garant »)

but when I load this into a variable in VBA and print it out in my immediate window I get the following.

?sText

[NOM DU GARANT], une société en commandite constituée et enregistrée en vertu de la loi Canadain (???ci?­????après le « garant »)

When I pasted the cell here this is what I get.

[NOM DU GARANT], une société en commandite constituée et enregistrée en vertu de la loi Canadain (‍‍‍ci‍­‍‍‍‍après le « garant »)

When I go into the cell to I could delete the "hidden" characters.

My question is how can I find / remove / see these hidden characters?

Also is there a way to mitigate this issue. I assume these issues come from people copying from a French MS Word application and pasting into this Excel cell (I have no concrete evidence to support this).

Upvotes: 1

Views: 9835

Answers (2)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60224

The problem is that the VBA Editor does not display most UNICODE characters. Specifically, it's ability to display characters is limited to those on the Windows Code Page 1252 (the 8-byte characters).

It can manipulate higher characters, but they won't be displayed in the immediate window (or message boxes, for that matter).

I'm not sure why you feel you need to remove these characters, but:

You can determine the errant characters by displaying each character one at a time, and converting to both standard character code, and Unicode character codes.

In Excel, you might do:

A1:  your_text_string
B2:  =MID($A$1,ROWS($1:1),1)
C2:  =CODE(B2)
D2:  =UNICODE(B2)

Select B2:D2 and fill down as far as needed.

You can do something similar in VBA.

Your text string includes the Unicode ZWJ (zero-width-joiner) character for which the decimal code is 8205. So, although this works perfectly OK on your worksheet, it converts to ?'s in the VBA editor.

If you need to remove them, you can do it on the worksheet with

=SUBSTITUTE(A1,UNICHAR(8205),"")

or, the equivalent VBA code:

replace([a1],chrw(8205),"")

If you are not familiar, suggest internet search for ZWJ, the VBA ChrW function, and the Excel UNICHAR function.

If this is not an isolated instance, but you need to have a routine to remove any characters which cannot be displayed by the VBA editor, you can try this:

Option Explicit
Function removeChars(s As String)
    Dim i As Long, l As Long, sTemp As String
    Dim ch As String

For i = 1 To Len(s)
    ch = Mid(s, i, 1)
    If AscW(ch) <= 255 Then
        sTemp = sTemp & ch
    End If
Next i    

removeChars = sTemp

End Function

And, to show it works, here are screenshots of a MsgBox generated from the raw text, and the text produced by the removeChars UDF above:

enter image description here

enter image description here

I don't know if it would be any faster, but here is another UDF using Regular Expressions which can also do the same thing:

Option Explicit
Function removeChars(s As String)
    Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
    .Pattern = "[^\x00-\xFF]+"
    .Global = True
    removeChars = .Replace(s, "")
End With

End Function

This last UDF can be sped up a bit by using early-binding, and, if you have many strings to process, by doing it within VBA arrays.

Upvotes: 3

Asger
Asger

Reputation: 3877

You may recognize cells with such text by comparing Len(Range.Text) and Len(Range.Value2).

You may experiment with StrConv(string, vbFromUnicode) and StrConv(string, vbUnicode).

You may clean it with WorksheetFunction.Clean() (works only on lower ASCII codes).
If clean is not effective, I would try to find out the different character codes e. g. by Debug.Print Asc(Mid(Range.Value2, 1, 1)) and build a more/less difficult Replace function.


If you want to retain only a predefined set of characters, here's a simple function:

Private Function ReducedCharset(ByRef s As String) As String
    Dim i As Long
    Dim c As String
    Dim Temp As String
    Temp = ""
    For i = 1 To Len(s)
        c = Mid(s, i, 1)
        If c Like "[0-9A-Za-z€ÄÖÜßäöü]" Then
            Temp = Temp & c
        End If
    Next i
    ReducedCharset = Temp
End Function

Upvotes: 2

Related Questions