Jaehyun Shin
Jaehyun Shin

Reputation: 1602

Elixir convert encoding EUC-KR(JP, CH)... to UTF-8

I'm making crawling app. I want to parse some characters. But some pages are not UTF-8 charset.

I got page body and now I want to do some work with the body string. First of all, I should convert encoding to UTF-8 if the page encoding is not UTF-8.

How can I do?

Upvotes: 1

Views: 1587

Answers (1)

Patrick Oscity
Patrick Oscity

Reputation: 54684

You can use the Erlang iconv library to do such conversions. It's easy!

  1. Make sure you have iconv installed on your system
  2. Add {:iconv, "~> 1.0.0"} to deps and :iconv to applications in mix.exs
  3. Convert with :iconv.convert("euc-kr", "utf-8", "input")

You can find a list of supported encodings on the libiconv page or by running iconv --list in the command line.

Upvotes: 2

Related Questions