Reputation: 22123
The characters 0x91
, 0x92
, 0x93
, and 0x94
are supposed to represent what in Unicode are U+2018
, U+2019
, U+201c
, and U+201d
, or the "opening single quote", "closing single quote", "opening double quote", and "closing double quote". I thought that it was ISO-8859-1
but when I try to process a file using IO.read('file', :encoding=>'ISO-8859-1')
it still does not recognize these characters.
If it isn't ISO-8859-1
then what is it? And if it is, why doesn't ruby recognize these characters?
UPDATE: Apparently this encoding is supposed to be Windows-1252. But ruby still does not recognize these characters when I do IO.read('file', :encoding=>'Windows-1252')
.
UPDATE 2: Nevermind, Windows-1252
works.
Upvotes: 0
Views: 1111
Reputation: 434665
0x91 is the Windows-1251 representation of Unicode's \u2018
(AKA ‘
):
>> "\x91".force_encoding('windows-1251').encode('utf-8')
=> "‘"
Windows-1251 and Latin-1 (AKA ISO 8859-1) are not the same, try using windows-1251
as the encoding:
IO.read('file', :encoding => 'windows-1251')
That will give you a string that knows it is Windows-1251. If you want UTF-8, then perhaps you want to specifying the :internal_encoding
and :external_encoding
:
IO.read('file', :external_encoding => 'windows-1251', :internal_encoding => 'utf-8')
Upvotes: 3