Reputation: 15336
This works: "ы д" |> IO.puts
But this is not: "ы д" |> String.split(~r/[^а-я]+/) |> hd |> IO.puts
** (ArgumentError) argument error
(stdlib) :io.put_chars(#PID<0.26.0>, :unicode, [<<209>>, 10])
Why?
Upvotes: 1
Views: 209
Reputation: 222060
Regex in Elixir are not Unicode codepoint based by default. You need to pass the u
modifier to enable matching on Unicode codepoints:
iex(1)> "ы д" |> String.split(~r/[^а-я]+/u)
["ы", "д"]
iex(2)> "ы д" |> String.split(~r/[^а-я]+/u) |> hd
"ы"
Without u
, the return values are not UTF-8:
iex(1)> "ы д" |> String.split(~r/[^а-я]+/)
[<<209>>, "д"]
Upvotes: 3