Reputation: 121010
Assume we get a charlist
from the foreign source, and it basically represents a string in some legacy 1-byte encoding like ISO-8859-2
. There is a CodepageX
package, that simplifies the conversions between different encodings, but it’s to_string
function expects a [binary
] as an input.
All the standard library functions assume Latin1
aka ISO-8859-1
input encoding when transforming to utf8
(like to_string
, IO.chardata_to_string
, "#{}"
etc.)
What I came up with is:
input
|> to_string
|> Codepagex.from_string!(:iso_8859_1)
|> Codepagex.to_string!(:iso_8859_2) # target encoding
which is a bit ugly.
Is there any robust and handy built-in/idiomatic elixir way to get a string
out of charlist
in known encoding?
Upvotes: 1
Views: 287
Reputation: 222288
to_string
on a List of integers in Elixir treats the integers as Unicode codepoints (to_string [960] #=> "π"
) while you want to treat each integer as a byte. In Erlang, this can be done using list_to_binary
. I couldn't find any wrapper for this in Elixir's builtin modules but you can always call :erlang.list_to_binary
:
iex(1)> [224] |> :erlang.list_to_binary
<<224>>
iex(2)> inspect ([224] |> to_string), binaries: :as_binaries
"<<195, 160>>"
iex(3)> [224] |> :erlang.list_to_binary |> Codepagex.to_string!(:iso_8859_1)
"à"
iex(4)> [224] |> :erlang.list_to_binary |> Codepagex.to_string!(:iso_8859_2)
"ŕ"
Upvotes: 1