Ruby and encoding conversion

Question

I'm importing a CSV file into Ruby (1.8.7). File.open('path/to/file.csv').read returns this in the console:

Stefan,Engstr\232m

The encoding is identified as iso-8859-2 by UniversalDetector (chardet gem).

UniversalDetector::chardet("Stefan,Engstr\232m")
=> {"confidence"=>0.626936305574385, "encoding"=>"ISO-8859-2"}

Trying to convert the string yields the following:

Iconv.conv("UTF-8", "ISO-8859-2", "Stefan,Engstr\232m")
 => "Stefan,Engstrm"

whereas I would expect:

 => "Stefan,Engström"

Could the string really be in some other encoding?
I haven't seen the \232 syntax before, usually when strings are strangely encoded some weird character will show up instead, e.g. � or some chinese.

Let me know if I should provide more information or elaborate on something.

mu is too short · Accepted Answer

Try using 'MacRoman' as your source encoding with Iconv:

>> Iconv.conv("UTF-8", "MacRoman", "Stefan,Engstr\232m")
=> "Stefan,Engström"

Answers (1)