Reputation: 3526
I have a simple javascript file (let's call it index.js) with the following:
console.log('pérola');
I use VSCode on windows 10 and it uses as terminal the powershell, when I execute the file using:
node index.js
I get the following output:
pérola
If I run the following:
node index.js > output.txt
I get the following on the file:
p├®rola
It seems there is some issue with the encoding of powershell when writing to files, when I open the file on VSCode I can see on the bottom right that the encoding is UTF-16 LE.
I also already tried the following:
node index.js | out-file -encoding utf8 output.txt
The file is saved in UTF8 with BOM but still with wrong encoding since what I see is p├®rola and not pérola
Can someone explain me what is wrong here? Thank you.
Upvotes: 6
Views: 1504
Reputation: 440102
What node
outputs is UTF-8-encoded.
PowerShell's >
operator does not pass the underlying bytes through to the output file. [Update: in PowerShell (Core) v7.4+ it now does.]
Instead, PowerShell converts the bytes output by node
into .NET strings based on the encoding stored in [Console]::OutputEncoding
and then saves the resulting strings based on the encoding implied by the >
operator, which is - effectively, not technically - an alias of the Out-File
cmdlet.
In other words: for PowerShell to properly interpret node
's output you must (temporarily) set [Console]::OutputEncoding
to [System.Text.Utf8Encoding]::new()
.
Additionally, you must then decide what character encoding you want the output file to have, by using Out-File -Encoding
or - preferably, if the input is text already - Set-Content -Encoding
instead of >
.
That is, you need to do this unless >
/ Out-File
's default character encoding works for you: it is "Unicode" (UTF16-LE) in Windows PowerShell, and BOM-less UTF-8 in PowerShell [Core] v6+.
See also:
This answer for background information on how to make PowerShell console windows use UTF-8 consistently when communication with external programs[1], both when sending data to external programs ($OutputEncoding
) and when interpreting data from external programs ([Console]::OutputEncoding
):
In short, place the following statement in your $PROFILE
:
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
If you're running in the - obsolescent - Windows PowerShell ISE, you need an additional command to ensure that the ISE first allocates a hidden console behind the scenes; note that in the recommended replacement, Visual Studio Code with its PowerShell extension, this is not necessary:
$null = chcp # Run any console application to force the ISE to create a console.
$OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
This answer for a system-wide way to make non-Unicode (console) applications use UTF-8, available in recent versions of Windows 10. This makes both cmd.exe
and PowerShell use UTF-8 by default.[1]
[1] What encoding PowerShell's own cmdlets use is not controlled by this; PowerShell cmdlets have their own defaults, which are - unfortunately - inconsistent in Windows PowerShell, whereas in PowerShell [Core] v6+ (BOM-less) UTF-8 is the consistent default; see this answer.
Upvotes: 9