Alex Craft
Alex Craft

Reputation: 15336

Why Elixir can't print Unicode String after splitting it?

This works: "ы д" |> IO.puts

But this is not: "ы д" |> String.split(~r/[^а-я]+/) |> hd |> IO.puts

** (ArgumentError) argument error
    (stdlib) :io.put_chars(#PID<0.26.0>, :unicode, [<<209>>, 10])

Why?

Upvotes: 1

Views: 209

Answers (1)

Dogbert
Dogbert

Reputation: 222060

Regex in Elixir are not Unicode codepoint based by default. You need to pass the u modifier to enable matching on Unicode codepoints:

iex(1)> "ы д" |> String.split(~r/[^а-я]+/u)
["ы", "д"]
iex(2)> "ы д" |> String.split(~r/[^а-я]+/u) |> hd
"ы"

Without u, the return values are not UTF-8:

iex(1)> "ы д" |> String.split(~r/[^а-я]+/)
[<<209>>, "д"]

Upvotes: 3

Related Questions