Reputation:
Assuming I have a binary
Message = <<"string containing emoji">>.
How do I properly encode it in Unicode? I tried doing:
Encoded = <<Message/utf16>>.
I get this warning when compiling the file:
Warning: binary construction will fail with a 'badarg' exception (invalid Unicode code point in a utf8/utf16/utf32 segment)
I tried this with /utf8 as well. Same warning.
Upvotes: 0
Views: 1482
Reputation: 41528
Assuming that the binary you start with is encoded according to UTF-8, and you need to encode it as little-endian UTF-16, this should work:
unicode:characters_to_binary(<<"string containing emoji">>, utf8, {utf16, little})
See the documentation for the Unicode module for more information.
The reason why <<Message/utf16>>
fails is that the utf8
, utf16
and utf32
specifiers in bit syntax encode a single codepoint, not an entire string. So to encode the character U+1F64C
, you could use:
2> <<16#1f64c/utf8>>.
<<240,159,153,140>>
3> <<16#1f64c/utf16>>.
<<"\330=\336L">>
4> <<16#1f64c/utf32>>.
<<0,1,246,76>>
Upvotes: 3
Reputation: 2496
You may need to add -*- coding: utf8 -*-
as the first line of your module, and use /utf8.
My guess is that you are using Erlang/OTP < 17, meaning files are considered latin-1 unless specified otherwise.
Upvotes: 0