Harry
Harry

Reputation: 11

Is there a way to make VS Code not replace unknown text characters?

I'm currently using VS code to write a PowerShell script. As part of this script REGEX is used to replace/remove an atypical character that ends up in the data fairly often and causes trouble down the line. The character is (U+2019) and when the script is opened in code it is replaced permanently with (U+FFFD)

thus the line: $user.Name = $user.Name -Replace "'|\’|\(|\)|\s+",""

Permanently becomes: $user.Name = $user.Name -Replace "'|\�|\(|\)|\s+",""

until it is manually changed. Seeing as I can paste the U+2019 character in once the file is open and then run the code, I assume that VS code can interpret it okay and the problem is with loading the file in. Is there some option that I can set to stop this being replaced when I open the file?

Upvotes: 1

Views: 8359

Answers (3)

Kane Shynin
Kane Shynin

Reputation: 101

In my case, turning on the VS Code setting, "Files: Auto Guess Encoding," has fixed the problem, both for reading and saving.

Upvotes: 3

js2010
js2010

Reputation: 27433

If I save in Vscode as Windows 1252 encoding, I see the character "’" change to on next opening. I think the problem is Vscode doesn't recognize Windows 1252. It opens it as UTF8. If you reopen with the Windows 1252 encoding, it displays correctly. The other encodings work fine, even to display the character. This includes utf8 no bom.

Even Powershell 5 doesn't have this problem with Windows 1252, only Vscode. Set-content and get-content in Powershell 5 default to Windows 1252.

"’" | set-content file
get-content file

’

Powershell 7 would actually have the same problem:

get-content file

�

Upvotes: 0

HAL9256
HAL9256

Reputation: 13453

This looks like it all comes down to encoding. Visual Studio Code by default uses UTF-8 and can in general handle saving/viewing Unicode properly.

If the issue is on Opening the file, then is is a case where Visual Studio Code is misinterpreting the file encoding on Opening the file. You can change the encoding (Configuring VS Code encoding) via settings in VS Code for file specific encoding (e.g. UTF-8, UTF-8BOM, UTF-16LE,etc.) by changing the "files.encoding" setting.

"files.encoding": "utf8bom"

If the issue is on saving the file, then it is being saved as ASCII(aka. Windows-1252) and not as proper UTF-8 or equivalent. On save, the character is replaced with the Replacement Character (U+FFFD) which would be displayed on the next time it is opened.

Note: The default encoding used for Windows PowerShell v5.1 is Windows-1252, and may be why saving the scripts with special characters may not work. PowerShell Core v6+ uses UTF-8 by default.

Upvotes: 3

Related Questions