Reputation: 111
Why is the following not a valid binary string ?
String.valid?(<<239, 191, 191>>)
false
Upvotes: 3
Views: 2095
Reputation: 222060
The bytes 239, 191, 191
in UTF-8 decodes to the Unicode Codepoint U+FFFF
:
iex(1)> <<x::utf8>> = <<239, 191, 191>>
<<239, 191, 191>>
iex(2)> x
65535
iex(3)> x == 0xFFFF
true
which is a Unicode Non-Character and String.valid?/1
has a list of all such characters and returns false
when it encounters any of those.
I couldn't find any function in Elixir that only checks for UTF-8 validity and skips non-character checks, but it's trivial to write one:
defmodule A do
def valid_utf8?(<<_::utf8, rest::binary>>), do: valid_utf8?(rest)
def valid_utf8?(<<>>), do: true
def valid_utf8?(_), do: false
end
for binary <- [<<0>>, <<239, 191, 191>>, <<128>>] do
IO.inspect {binary, String.valid?(binary), A.valid_utf8?(binary)}
end
Output:
{<<0>>, true, true}
{<<239, 191, 191>>, false, true}
{<<128>>, false, false}
Upvotes: 8