Reputation: 15102
I'm importing a CSV file into Ruby (1.8.7). File.open('path/to/file.csv').read returns this in the console:
Stefan,Engstr\232m
The encoding is identified as iso-8859-2 by UniversalDetector (chardet gem).
UniversalDetector::chardet("Stefan,Engstr\232m")
=> {"confidence"=>0.626936305574385, "encoding"=>"ISO-8859-2"}
Trying to convert the string yields the following:
Iconv.conv("UTF-8", "ISO-8859-2", "Stefan,Engstr\232m")
=> "Stefan,Engstrm"
whereas I would expect:
=> "Stefan,Engström"
Let me know if I should provide more information or elaborate on something.
Upvotes: 3
Views: 1945
Reputation: 434865
The encoding is probably "Macintosh Roman", a couple other options would be "Mac Central European" and "Mac Icelandic". The \nnn
notation uses octal so \232
is 154 in decimal and character 154 is the lower case O-umlaut ("ö") that you're expecting in all three of those encodings; I don't see 154 in any of the Windows codepages or ISO 8859 character sets. I'd guess that Mac Roman is more common than the Icelandic or Central European encodings.
Try using 'MacRoman'
as your source encoding with Iconv:
>> Iconv.conv("UTF-8", "MacRoman", "Stefan,Engstr\232m")
=> "Stefan,Engström"
Upvotes: 5