Reputation: 38228
I have an html file test.html created with atom which contains:
Testé encoding utf-8
When I read it with Powershell console (I'm using French Windows)
Get-Content -Raw test.html
I get back this:
Testé encoding utf-8
Why is the accent character not printing correctly?
Upvotes: 3
Views: 9755
Reputation: 439862
The Atom editor creates UTF-8 files without a pseudo-BOM by default (which is the right thing to do, from a cross-platform perspective).
Windows PowerShell[1] only recognizes UTF-8 files with a pseudo-BOM.
Get-Content
/ Set-Content
(where this encoding is called Default
and is the actual default and therefore needn't be specified); by contrast, Out-File
/ >
creates UTF-16LE-encoded files (Unicode
) by default.)Therefore, in order for Get-Content
to recognize a BOM-less UTF-8 file correctly in Windows PowerShell, you must use -Encoding utf8
.
[1] By contrast, the cross-platform PowerShell Core edition commendably defaults to UTF-8, consistently across cmdlets, both on reading and writing, so it does interpret UTF-8-encoded files correctly even without a BOM and by default also creates files without a BOM.
Upvotes: 7
Reputation: 8931
# Created a UTF-8 Sig File
notepad .\test.html
# Get File contents with/without -raw
cat .\test.html;Get-Content -Raw .\test.html
Testé encoding utf-8
Testé encoding utf-8
# Check Encoding to make sure
Get-FileEncoding .\test.html
utf8
As you can see, it definitely works in PowerShell v5 on Windows 10. I'd double check the file formatting and the contents of the file you created, as there may have been characters introduced which your editor might not pick up.
If you do not have Get-FileEncoding
as a cmdlet in your PowerShell, here is an implementation you can run:
function Get-FileEncoding([Parameter(Mandatory=$True)]$Path) {
$bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 4 -TotalCount 4)
if(!$bytes) { return 'utf8' }
switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
'^efbbbf' {return 'utf8'}
'^2b2f76' {return 'utf7'}
'^fffe' {return 'unicode'}
'^feff' {return 'bigendianunicode'}
'^0000feff' {return 'utf32'}
default {return 'ascii'}
}
}
Upvotes: 1