user376845
user376845

Reputation:

Encoding emoji in Erlang

Assuming I have a binary

Message = <<"string containing emoji">>.

How do I properly encode it in Unicode? I tried doing:

Encoded = <<Message/utf16>>.

I get this warning when compiling the file:

Warning: binary construction will fail with a 'badarg' exception (invalid Unicode code point in a utf8/utf16/utf32 segment)

I tried this with /utf8 as well. Same warning.

Upvotes: 0

Views: 1482

Answers (2)

legoscia
legoscia

Reputation: 41528

Assuming that the binary you start with is encoded according to UTF-8, and you need to encode it as little-endian UTF-16, this should work:

unicode:characters_to_binary(<<"string containing emoji">>, utf8, {utf16, little})

See the documentation for the Unicode module for more information.

The reason why <<Message/utf16>> fails is that the utf8, utf16 and utf32 specifiers in bit syntax encode a single codepoint, not an entire string. So to encode the character U+1F64C, you could use:

2> <<16#1f64c/utf8>>.
<<240,159,153,140>>
3> <<16#1f64c/utf16>>.
<<"\330=\336L">>
4> <<16#1f64c/utf32>>.
<<0,1,246,76>>

Upvotes: 3

fenollp
fenollp

Reputation: 2496

You may need to add -*- coding: utf8 -*- as the first line of your module, and use /utf8.

My guess is that you are using Erlang/OTP < 17, meaning files are considered latin-1 unless specified otherwise.

Upvotes: 0

Related Questions