Reputation: 297
I want to use ruby with terminal input in my windows. Why ruby community can not solve this UTF-8 issue on windows? Is it hard? I am wondering how python, java or other langs did this? I can work greatly with python on windows utf-8 with no pain.
With ruby 3.0.1
x = gets.chomp
çağrı
=> "\x87a\xA7r\x8D"
puts x
�a�r�
=> nil
x.valid_encoding?
=> false
I looked up this https://bugs.ruby-lang.org/issues/16604 it did not work.
Upvotes: 0
Views: 575
Reputation: 297
On Turkish windows PC's cmd shell uses encoding of CP857
You can see it at cmd > preferences section
Here is the practice solution with contributions of Holger.
irb(main):005:0> x = gets.chomp
Here is the Turkish chars ğĞüÜşŞiİıIöÖçÇ
=> "Here is the Turkish chars \xA7\xA6\x81\x9A\x9F\x9Ei\x98\x8DI\x94\x99\x87\x80"
irb(main):006:0> x.force_encoding "CP857"
=> "Here is the Turkish chars \xA7\xA6\x81\x9A\x9F\x9Ei\x98\x8DI\x94\x99\x87\x80"
irb(main):007:0> x.valid_encoding?
=> true
irb(main):008:0> x.encode("UTF-8", undef: :replace)
=> "Here is the Turkish chars ğĞüÜşŞiİıIöÖçÇ"
Upvotes: 0
Reputation: 55718
With Ruby 3.0, the default external encoding (i.e. the assumed encoding of any data read from outside the ruby process such as from your shell when using gets)
changed to UTF-8 on Windows. This was a response to various issues occuring with encoding on Windows.
The data you are reading there from your shell, however, is not UTF-8 encoded. Instead, it appears your shell uses some different encoding, e.g. cp850
.
A possible workaround would be to instruct Ruby to assume the locale encoding of your environment which you can set with the -E
switch on the command invocation, e.g.:
irb -E locale
or by setting Encoding.default_external
manually in your script to the correct encoding of your environment.
Upvotes: 2