Reputation: 31
I am a beginner and working with old database where characters like Ę,ę,Ń,ń
are saved like ;;;ca ...
It's Elixir language with Phoenix Framework.
I want to multiple replace that characters in code, I have a function:
def convert_content(content) do
content = String.replace(content, ";;;ca", "Ę")
content = String.replace(content, ";;;ea", "ę")
content = String.replace(content, ";;;d1", "Ń")
content = String.replace(content, ";;;f1", "ń")
end
But it is very slow.. I found https://github.com/elixir-lang/elixir/pull/4474 but it doesn't work. Thanks for help.
Upvotes: 3
Views: 3481
Reputation: 2922
I think your problem stems from the fact that you traverse your string n
times, if you have n
characters to replace.
So in order to make this faster for a single string you would want traverse your string only once. That would make it, I think, faster. I don't see an immediate answer on how you can do this, except rolling your own algorithm.
So in order to verify what I'm thinking I wrote a little script to benchmark, and turns out that the implementation you suggested is the fastest of all.
First a note on how I tested the performance. I generated a random string to test each algorithm. So each algorithm was tested with the same input, and generating the input was not counted in the results.
Then I ran each algorithm 100 times, timing the execution with :timer.rc/1
. I summed up all the results and divided them by 100 to get the average execution time.
I also, given that your question lacked the details, used my own alphabet. You can replace it as you see fit. I only assumed that the prefix of each to-replace string is is ";;;;".
Here is the alphabet.
def alphabet do
%{
";;;;a" => "a",
";;;;b" => "b",
";;;;c" => "c",
";;;;d" => "d",
";;;;e" => "e",
";;;;f" => "f",
";;;;g" => "g",
";;;;h" => "h",
";;;;i" => "i",
";;;;j" => "j",
";;;;k" => "k",
";;;;l" => "l",
";;;;m" => "m",
";;;;n" => "n",
";;;;o" => "o",
";;;;p" => "p",
";;;;q" => "q",
";;;;r" => "r",
";;;;s" => "s",
";;;;t" => "t",
";;;;u" => "u",
";;;;v" => "v",
";;;;w" => "w",
";;;;x" => "x",
";;;;y" => "y",
";;;;z" => "z"
}
end
It is implemented as a map, which should give us O(log n) lookup.
First I started out with a naive version; the one you showed.
def naive(input) do
alphabet()
|> Map.keys()
|> Enum.reduce(input, fn key, str ->
String.replace(str, key, alphabet()[key])
end)
end
Here you simply traverse all the keys of the alphabet and check if they are present in the string, and if so, you replace all of them.
The average execution time for this function with an input size of 10000 and 100 runs is 1.40691 ms.
The second approach I took was to use the suggestion by the other answer here, namely using String.replace/4
instead of manually checking each occurence.
Note that I snipped out a big piece of the alphabet here for brevity.
def better(input) do
String.replace(
input,
[
";;;;a",
";;;;b",
...
";;;;y",
";;;;z"
],
fn
";;;;a" -> "a"
";;;;b" -> "b"
...
";;;;y" -> "y"
";;;;z" -> "z"
end
)
end
The average execution time for this function with an input size of 10000 and 100 runs is 1.3419400000000001 ms
A final solution is on my part, where I tried rolling my own algorithm.
The idea here is to traverse the string, and as soon as we see that the string starts with four ";" characters, we can replace based on he fifth character.
def alphabet2 do
%{
?a => ?a,
?b => ?b,
...
?y => ?y,
?z => ?z
}
end
def process(cl, acc) do
case cl do
[] ->
acc
[?;, ?;, ?;, ?;, c | r] ->
new_char = alphabet2()[c]
process(r, [new_char | acc])
[c | r] ->
process(r, [c | acc])
end
end
def even_better(input) do
cl = String.to_charlist(input)
process(cl, []) |> Enum.reverse() |> List.to_string()
end
The average execution time for this function with an input size of 10000 and 100 runs is 1.21495 ms.
Your solution is fast enough for what you have. The only thing I can recommend doing is that you parallelize the processing of a batch of strings. You can not make a single string faster, but you can more than easily process a bunch of strings faster.
The benchmark code I used is the following.
avg_ms =
1..runs
|> Enum.map(fn _ -> :timer.tc(fn -> even_better(str) end) end)
|> Enum.reduce(0, fn {time, _}, acc -> acc + time end)
|> (fn s -> s / runs / 1000 end).()
IO.puts("Even Better took avg #{avg_ms} ms")
Also note that these solutions can be made prettier by using some macros. See the other answer for that.
Upvotes: 10
Reputation: 121000
String.replace/4
accepts a list of replacements as a pattern and a function.
to_replace = ~w|;;;ca ;;;ea ;;;d1 ;;;f1|
content = Enum.join(to_replace, " | ")
#⇒ ";;;ca | ;;;ea | ;;;d1 | ;;;f1"
String.replace(content, to_replace, fn
";;;ca" -> "Ę"
";;;ea" -> "ę"
";;;d1" -> "Ń"
";;;f1" -> "ń"
end)
#⇒ "Ę | ę | Ń | ń"
One might also use a bit of metaprogramming to produce function clauses if there are many items to replace.
defmodule R do
@r ~w|;;;ca ;;;ea ;;;d1 ;;;f1|
@s ~w|Ę ę Ń ń|
Enum.each(Enum.zip(@r, @s), fn {r, s} ->
defp one(unquote(r)), do: unquote(s)
end)
def all(content) do
String.replace(content, @r, &one/1)
end
end
R.all("|;;;ca|;;;ea|;;;d1|;;;f1")
#⇒ "|Ę|ę|Ń|ń"
Upvotes: 6