rGiosa
rGiosa

Reputation: 365

Ruby UTF-8 Encoding doesn't work in Windows even with Magic Comment

I'm trying to run a file (ruby anyfile.rb in cmd prompt) with the following contents:

# encoding: utf-8
puts 'áá'

happens the following error:

invalid multibyte char (UTF-8)

It seems that Ruby does not understand the magic comment...

EDIT: If I remove the "# encoding: utf-8" and run the command prompt like this:

ruby-E:UTF-8 encoding.rb

then it works - any ideas?

EDIT2: when i run:

ruby -e 'p [Encoding.default_external, Encoding.default_internal]'

i got [#Encoding:CP850, nil], maybe my Encoding.default_external is wrong?!

Environment:

Upvotes: 2

Views: 4003

Answers (4)

Luis Lavena
Luis Lavena

Reputation: 10378

I've encountered similar issues from time to time with files that were not saved as UTF-8, even when the magic comment states so.

I've found that Ruby 1.9.2 had issues to properly convert UTF-8 to codepages 850 and 437, the defaults for command prompt on Windows.

I do recommend you upgrade to Ruby 1.9.3 (latest is patchlevel 125) which solves a lot of encoding issues, specially on Windows.

Also, to verify that your saved file do not contain a Unicode BOM (so it is plain UTF) and is properly saved.

To verify that, you can switch the codepage in the console to unicode (chcp 65001) and try type myscript.rb

You should see the accented letters correctly.

Last but no least, ensure your command prompt uses a TrueType font so extended characters are properly displayed.

Hope that helps.

Upvotes: 2

Eifion
Eifion

Reputation: 5563

Are you sure you selected 'UTF-8' from the Encoding dropdown when you saved the file in Notepad? I've just tried this on an XP machine and your code example worked for me.

Upvotes: 0

Jörg W Mittag
Jörg W Mittag

Reputation: 369624

I believe this is a classic case of "if you hear hooves, think horses, not zebras".

The error message is telling you that you have a byte sequence in your file that is not a valid UTF-8 multibyte sequence.

It is definitely possible that

It seems that Ruby does not understand the magic comment...

as you say, and that up until now nobody noticed that magic comments don't actually work because you are the first person in the history of humankind to actually try to use magic comments. (Actually, this is not possible. If Ruby didn't understand magic comments, it would complain about an invalid ASCII character, since ASCII is the default encoding if no magic comment is present.)

Or, there actually is an invalid multibyte UTF-8 sequence in your file.

Which do you think is more likely? If I were you, I would check my file.

Upvotes: 3

Reactormonk
Reactormonk

Reputation: 21740

Try

# encoding: iso-8859-1

Not everything that's text is utf8.

Upvotes: 0

Related Questions