Soleil
Soleil

Reputation: 7287

How can I find Unicode characters that are not UTF-8 in Visual Studio Code?

I have Unicode characters that I can't see that are not UTF-8, and I need to spot them.

I used the extension Highlight Bad Chars (Kevin Wenger), but it's not sufficient. In particular, I don't know which these characters are and I don't want to have to define them in advance.

How can I do this with Visual Studio Code?

Upvotes: 53

Views: 66011

Answers (3)

Mark
Mark

Reputation: 181060

In Stable Build v1.63 there is a new method for highlighting various Unicode characters that might otherwise be difficult to spot in your code. These are the new settings:

New Unicode settings

You can use these colorCustomizations to change the default orange borders:

{
  "workbench.colorCustomizations": {

    "editorUnicodeHighlight.border": "#00ff37",
    "editorUnicodeHighlight.background": "#f00",        // will be in vscode v1.66

    // "minimap.unicodeHighlight": "#ff0000",           // removed in v1.64
    // "editorOverviewRuler.unicodeForeground": "#ff0000"    // removed in v1.64
}

Apparently, indicators for these Unicode warnings will not be shown in the minimap or overview ruler as of v1.64, see Consider removing the Unicode highlight scroll bar decoration

Here is how a Unicode zero-width space (U+200B) appears with these settings:

Unicode zero-width space in Visual Studio Code

The zero-width character is an invisible Unicode character controlled by the Unicode Highlight: Invisible Characters setting above.


From v1.63 release notes: Unicode highlighting

Read this article or this article for how invisible or confusable Unicode characters can be used in Unicode spoofing attacks.

Note that this feature does not detect all possible Unicode spoofing attacks, as this depends on your font and locale settings. Also, the detection of ambiguous characters is done heuristically. To be on the safe side, the restricted mode of the workspace trust should be used to review source code, as all non-ASCII characters are highlighted in untrusted workspaces.

The settings editor.unicodeHighlight.invisibleCharacters, editor.unicodeHighlight.ambiguousCharacters or editor.unicodeHighlight.nonBasicASCII can be set to false to disable the corresponding feature.

Individual characters can be excluded from being highlighted and characters in comments or text and markdown documents are not highlighted by default.

Upvotes: 15

pvilas
pvilas

Reputation: 1377

Find [^\x00-\x7f] and check Use Regular Expression.

It is taken from Finding Those Pesky Unicode Characters in Visual Studio.

Upvotes: 110

User
User

Reputation: 65951

You can try the Gremlins extension which I found better than Highlight Bad Chars (Kevin Wenger) (at least, Gremlins worked out of the box; I couldn't get Highlight Bad Chars to highlight anything).

Upvotes: 42

Related Questions