Reputation: 31

Elixir and multiple replace characters in string

I am a beginner and working with old database where characters like Ę,ę,Ń,ń are saved like ;;;ca ... It's Elixir language with Phoenix Framework. I want to multiple replace that characters in code, I have a function:

  def convert_content(content) do
    content = String.replace(content, ";;;ca", "Ę")
    content = String.replace(content, ";;;ea", "ę")
    content = String.replace(content, ";;;d1", "Ń")
    content = String.replace(content, ";;;f1", "ń")
  end

But it is very slow.. I found https://github.com/elixir-lang/elixir/pull/4474 but it doesn't work. Thanks for help.

Upvotes: 3

Answers (2)

Christophe De Troyer

Reputation: 2922

Problem

I think your problem stems from the fact that you traverse your string n times, if you have n characters to replace.

So in order to make this faster for a single string you would want traverse your string only once. That would make it, I think, faster. I don't see an immediate answer on how you can do this, except rolling your own algorithm.

So in order to verify what I'm thinking I wrote a little script to benchmark, and turns out that the implementation you suggested is the fastest of all.

Measuring

First a note on how I tested the performance. I generated a random string to test each algorithm. So each algorithm was tested with the same input, and generating the input was not counted in the results.

Then I ran each algorithm 100 times, timing the execution with :timer.rc/1. I summed up all the results and divided them by 100 to get the average execution time.

I also, given that your question lacked the details, used my own alphabet. You can replace it as you see fit. I only assumed that the prefix of each to-replace string is is ";;;;".

Here is the alphabet.

  def alphabet do
    %{
      ";;;;a" => "a",
      ";;;;b" => "b",
      ";;;;c" => "c",
      ";;;;d" => "d",
      ";;;;e" => "e",
      ";;;;f" => "f",
      ";;;;g" => "g",
      ";;;;h" => "h",
      ";;;;i" => "i",
      ";;;;j" => "j",
      ";;;;k" => "k",
      ";;;;l" => "l",
      ";;;;m" => "m",
      ";;;;n" => "n",
      ";;;;o" => "o",
      ";;;;p" => "p",
      ";;;;q" => "q",
      ";;;;r" => "r",
      ";;;;s" => "s",
      ";;;;t" => "t",
      ";;;;u" => "u",
      ";;;;v" => "v",
      ";;;;w" => "w",
      ";;;;x" => "x",
      ";;;;y" => "y",
      ";;;;z" => "z"
    }
  end

It is implemented as a map, which should give us O(log n) lookup.

Solution 1

First I started out with a naive version; the one you showed.

  def naive(input) do
    alphabet()
    |> Map.keys()
    |> Enum.reduce(input, fn key, str ->
      String.replace(str, key, alphabet()[key])
    end)
  end

Here you simply traverse all the keys of the alphabet and check if they are present in the string, and if so, you replace all of them.

The average execution time for this function with an input size of 10000 and 100 runs is 1.40691 ms.

Solution 2

The second approach I took was to use the suggestion by the other answer here, namely using String.replace/4 instead of manually checking each occurence.

Note that I snipped out a big piece of the alphabet here for brevity.

def better(input) do
    String.replace(
      input,
      [
        ";;;;a",
        ";;;;b",
        ...
        ";;;;y",
        ";;;;z"
      ],
      fn
        ";;;;a" -> "a"
        ";;;;b" -> "b"
        ...
        ";;;;y" -> "y"
        ";;;;z" -> "z"
      end
    )
  end

The average execution time for this function with an input size of 10000 and 100 runs is 1.3419400000000001 ms

Solution 3

A final solution is on my part, where I tried rolling my own algorithm.

The idea here is to traverse the string, and as soon as we see that the string starts with four ";" characters, we can replace based on he fifth character.

  def alphabet2 do
    %{
      ?a => ?a,
      ?b => ?b,
      ...
      ?y => ?y,
      ?z => ?z
    }
  end

  def process(cl, acc) do
    case cl do
      [] ->
        acc

      [?;, ?;, ?;, ?;, c | r] ->
        new_char = alphabet2()[c]
        process(r, [new_char | acc])

      [c | r] ->
        process(r, [c | acc])
    end
  end

  def even_better(input) do
    cl = String.to_charlist(input)
    process(cl, []) |> Enum.reverse() |> List.to_string()
  end

The average execution time for this function with an input size of 10000 and 100 runs is 1.21495 ms.

Conclusion

Your solution is fast enough for what you have. The only thing I can recommend doing is that you parallelize the processing of a batch of strings. You can not make a single string faster, but you can more than easily process a bunch of strings faster.

Benchmarks

The benchmark code I used is the following.

avg_ms =
  1..runs
  |> Enum.map(fn _ -> :timer.tc(fn -> even_better(str) end) end)
  |> Enum.reduce(0, fn {time, _}, acc -> acc + time end)
  |> (fn s -> s / runs / 1000 end).()

IO.puts("Even Better took avg #{avg_ms} ms")

Also note that these solutions can be made prettier by using some macros. See the other answer for that.

Upvotes: 10

Aleksei Matiushkin

Reputation: 121000

String.replace/4 accepts a list of replacements as a pattern and a function.

to_replace = ~w|;;;ca ;;;ea ;;;d1 ;;;f1|
content = Enum.join(to_replace, " | ")
#⇒ ";;;ca | ;;;ea | ;;;d1 | ;;;f1"
String.replace(content, to_replace, fn
  ";;;ca" -> "Ę"
  ";;;ea" -> "ę"
  ";;;d1" -> "Ń"
  ";;;f1" -> "ń"
end)
#⇒ "Ę | ę | Ń | ń"

One might also use a bit of metaprogramming to produce function clauses if there are many items to replace.

defmodule R do                           
  @r ~w|;;;ca ;;;ea ;;;d1 ;;;f1|
  @s ~w|Ę ę Ń ń|
  Enum.each(Enum.zip(@r, @s), fn {r, s} ->
    defp one(unquote(r)), do: unquote(s)
  end)
  def all(content) do
    String.replace(content, @r, &one/1)
  end
end

R.all("|;;;ca|;;;ea|;;;d1|;;;f1")
#⇒ "|Ę|ę|Ń|ń"

Upvotes: 6