Reputation: 1777
I have a misformed UTF-8 string consisting that should be written "Michèle Huà" but outputs as "Michèle HuÃ"
According to this table it is a problem between Windows-1252 and UTF-8 http://www.i18nqa.com/debug/utf8-debug.html
How do I make conversion?
scala> scala.io.Source.fromBytes("Michèle HuÃ".getBytes(), "ISO-8859-1").mkString
res25: String = Michèle HuÃ
scala> scala.io.Source.fromBytes("Michèle HuÃ".getBytes(), "UTF-8").mkString
res26: String = Michèle HuÃ
scala> scala.io.Source.fromBytes("Michèle HuÃ".getBytes(), "Windows-1252").mkString
res27: String = Michèle HuÃ
Thank you
Upvotes: 1
Views: 16416
Reputation: 167891
You don't actually have the complete string there, due to an unfortunate issue with one character printing blank. "Michèle Huà" when encoded as UTF-8 but read as Windows-1252 is actually "Michèle Huà ", where that last character is 0xA0 (but typically pastes as 0x20, a space).
If you can include that character, you can convert successfully.
scala> fixed = new String("Michèle HuÃ\u00A0".getBytes("Windows-1252"), "UTF-8")
fixed: String = Michèle Huà
Upvotes: 9