Polar Bear
Polar Bear

Reputation: 6798

How to read a file in utf8 encoding and output in Windows 10?

What is proper procedure to read and output utf8 encoded data in Windows 10?

My attempt to read utf8 encoded file in Windows 10 and output lines into terminal does not reproduce symbols of some languages.

In cmd window issued command chcp 65001. Following ruby code reads utf8 encoded file and outputs lines with puts.

fname = 'hello_world.dat'

File.open(fname,'r:UTF-8') do |f|
    puts f.read
end

hello_world.dat content

Afrikaans:    Hello Wêreld!
Albanian:     Përshendetje Botë!
Amharic:      ሰላም ልዑል!
Arabic:       مرحبا بالعالم!
Armenian:     Բարեւ աշխարհ!
Basque:       Kaixo Mundua!
Belarussian:  Прывітанне Сусвет!
Bengali:      ওহে বিশ্ব!
Bulgarian:    Здравей свят!
Catalan:      Hola món!
Chichewa:     Moni Dziko Lapansi!
Chinese:      你好世界!
Croatian:     Pozdrav svijete!
Czech:        Ahoj světe!
Danish:       Hej Verden!
Dutch:        Hallo Wereld!
English:      Hello World!
Estonian:     Tere maailm!
Finnish:      Hei maailma!
French:       Bonjour monde!
Frisian:      Hallo wrâld!
Georgian:     გამარჯობა მსოფლიო!
German:       Hallo Welt!
Greek:        Γειά σου Κόσμε!
Hausa:        Sannu Duniya!
Hebrew:       שלום עולם!
Hindi:        नमस्ते दुनिया!
Hungarian:    Helló Világ!
Icelandic:    Halló heimur!
Igbo:         Ndewo Ụwa!
Indonesian:   Halo Dunia!
Italian:      Ciao mondo!
Japanese:     こんにちは世界!
Kazakh:       Сәлем Әлем!
Khmer:        សួស្តី​ពិភពលោក!
Kyrgyz:       Салам дүйнө!
Lao:          ສະ​ບາຍ​ດີ​ຊາວ​ໂລກ!
Latvian:      Sveika pasaule!
Lithuanian:   Labas pasauli!
Luxemburgish: Moien Welt!
Macedonian:   Здраво свету!
Malay:        Hai dunia!
Malayalam:    ഹലോ വേൾഡ്!
Mongolian:    Сайн уу дэлхий!
Myanmar:      မင်္ဂလာပါကမ္ဘာလောက!
Nepali:       नमस्कार संसार!
Norwegian:    Hei Verden!
Pashto:       سلام نړی!
Persian:      سلام دنیا!
Polish:       Witaj świecie!
Portuguese:   Olá Mundo!
Punjabi:      ਸਤਿ ਸ੍ਰੀ ਅਕਾਲ ਦੁਨਿਆ!
Romanian:     Salut Lume!
Russian:      Привет мир!
Scots Gaelic: Hàlo a Shaoghail!
Serbian:      Здраво Свете!
Sesotho:      Lefatše Lumela!
Sinhala:      හෙලෝ වර්ල්ඩ්!
Slovenian:    Pozdravljen svet!
Spanish:      ¡Hola Mundo!
Sundanese:    Halo Dunya!
Swahili:      Salamu Dunia!
Swedish:      Hej världen!
Tajik:        Салом Ҷаҳон!
Thai:         สวัสดีชาวโลก!
Turkish:      Selam Dünya!
Ukrainian:    Привіт Світ!
Uzbek:        Salom Dunyo!
Vietnamese:   Chào thế giới!
Welsh:        Helo Byd!
Xhosa:        Molo Lizwe!
Yiddish:      העלא וועלט!
Yoruba:       Mo ki O Ile Aiye!
Zulu:         Sawubona Mhlaba!

Steven Penny suggested to use PowerShell and do not change code page. Following picture demonstrates that the issue persists.

Windows Terminal installer (which is not a part of Windows distribution) solves utf8 output issue, please see included screen capture.

image_1 image_2 Power Shell Windows Terminal

Upvotes: 0

Views: 284

Answers (1)

Zombo
Zombo

Reputation: 1

The problem is, you are using a some methods and tools that are really old. First:

  • Native codepage: 437
  • Switched codepage: 65001

You don't need to mess with the codepage any more, just leave it as the default. Also, from you picture I see you are also using Console Host, which is also really old. Windows Terminal [1] has been available since 2019, and has built in UTF-8 support. Using Windows Terminal, I can run your script, even without specifying UTF-8:

fname = 'hello_world.dat'

File.open(fname,'r') do |f|
   puts f.read
end

and I get perfect result:

To use Windows Terminal, download the msixbundle file [2], then install it. Or, as it's essentially just a Zip file, you can rename it to file.zip and extract it with Windows, then run WindowsTerminal.exe. Or, since you are really having trouble with this process, you can use a portable version I just created [3] (at your own risk).

  1. https://github.com/microsoft/terminal
  2. https://github.com/microsoft/terminal/releases/tag/v1.8.1444.0
  3. https://github.com/microsoft/terminal/files/6563899/CascadiaPackage_1.8.1444.0_x64.zip

Upvotes: 1

Related Questions