Sercan Tırnavalı
Sercan Tırnavalı

Reputation: 297

Ruby irb utf-8 encoding problem on windows 10 terminal input

I want to use ruby with terminal input in my windows. Why ruby community can not solve this UTF-8 issue on windows? Is it hard? I am wondering how python, java or other langs did this? I can work greatly with python on windows utf-8 with no pain.

With ruby 3.0.1

x = gets.chomp
çağrı
=> "\x87a\xA7r\x8D"

puts x
�a�r�
=> nil

x.valid_encoding?
=> false

I looked up this https://bugs.ruby-lang.org/issues/16604 it did not work.

Upvotes: 0

Views: 575

Answers (2)

Sercan Tırnavalı
Sercan Tırnavalı

Reputation: 297

On Turkish windows PC's cmd shell uses encoding of CP857

You can see it at cmd > preferences section

Here is the practice solution with contributions of Holger.

irb(main):005:0> x = gets.chomp
Here is the Turkish chars ğĞüÜşŞiİıIöÖçÇ
=> "Here is the Turkish chars \xA7\xA6\x81\x9A\x9F\x9Ei\x98\x8DI\x94\x99\x87\x80"

irb(main):006:0> x.force_encoding "CP857"
=> "Here is the Turkish chars \xA7\xA6\x81\x9A\x9F\x9Ei\x98\x8DI\x94\x99\x87\x80"

irb(main):007:0> x.valid_encoding?
=> true
irb(main):008:0> x.encode("UTF-8", undef: :replace)
=> "Here is the Turkish chars ğĞüÜşŞiİıIöÖçÇ"
 

Upvotes: 0

Holger Just
Holger Just

Reputation: 55718

With Ruby 3.0, the default external encoding (i.e. the assumed encoding of any data read from outside the ruby process such as from your shell when using gets) changed to UTF-8 on Windows. This was a response to various issues occuring with encoding on Windows.

The data you are reading there from your shell, however, is not UTF-8 encoded. Instead, it appears your shell uses some different encoding, e.g. cp850.

A possible workaround would be to instruct Ruby to assume the locale encoding of your environment which you can set with the -E switch on the command invocation, e.g.:

irb -E locale

or by setting Encoding.default_external manually in your script to the correct encoding of your environment.

Upvotes: 2

Related Questions