Why do you need these characters in the regex in this Elixir regex match?

Question

I came across a method such as:

  @spec split_words(String.t) :: [String.t]
  defp split_words(text) do
    Regex.scan ~r/(*UTF)[\p{L}0-9-]+/i, text
  end

It's really to be able to pass the following test:

  test "German" do
    expected = %{"götterfunken" => 1, "schöner" => 1, "freude" => 1}
    assert Words.count("Freude schöner Götterfunken") == expected
  end

What is (*UTF) - is that Elixir specific or a regex concept? I'm guessing it's to "cast" the string to UTF encoding. And what about \p{L} - is this an "expander" of some kind to let you know to use an alphabet that includes the umlaut character?

I saw it in this repository: https://github.com/alxndr/exercism/blob/master/elixir/word-count/word_count.exs#L25

Why do you need these characters in the regex in this Elixir regex match?

Answers (1)

Related Questions