Mukul Chakravarty
Mukul Chakravarty

Reputation: 111

Checking the validity of a string in elixir

Why is the following not a valid binary string ?

String.valid?(<<239, 191, 191>>)
false

Upvotes: 3

Views: 2095

Answers (1)

Dogbert
Dogbert

Reputation: 222060

The bytes 239, 191, 191 in UTF-8 decodes to the Unicode Codepoint U+FFFF:

iex(1)> <<x::utf8>> = <<239, 191, 191>>
<<239, 191, 191>>
iex(2)> x
65535
iex(3)> x == 0xFFFF
true

which is a Unicode Non-Character and String.valid?/1 has a list of all such characters and returns false when it encounters any of those.


I couldn't find any function in Elixir that only checks for UTF-8 validity and skips non-character checks, but it's trivial to write one:

defmodule A do
  def valid_utf8?(<<_::utf8, rest::binary>>), do: valid_utf8?(rest)
  def valid_utf8?(<<>>), do: true
  def valid_utf8?(_), do: false
end

for binary <- [<<0>>, <<239, 191, 191>>, <<128>>] do
  IO.inspect {binary, String.valid?(binary), A.valid_utf8?(binary)}
end

Output:

{<<0>>, true, true}
{<<239, 191, 191>>, false, true}
{<<128>>, false, false}

Upvotes: 8

Related Questions