Reputation:
For the purpose of learning I've tried to come up with a string replacement function, here's what I have now:
let string_replace where what replacement =
let wlength = (String.length what - 1) in
let length = (String.length where - 1) in
let rec loop i shift =
let [shift, i] =
if wlength == shift then
(String.blit replacement 0 where (i - wlength) (wlength + 1); [0, i])
else if (String.get where i == String.get what shift) then
[shift + 1, i]
else [0, i - shift]
in if i < length then loop (i + 1) shift
in loop 0 0; where;;
Now, there's a test here: http://raid6.com.au/~onlyjob/posts/arena/ and so I wanted to see how am I doing relatively. And, here, I wrote the test:
let test =
let initial = "abcdefgh" ^ "efghefgh" in
let initial_time = Sys.time() in
let initial_length = String.length initial in
let imax = (1024 / (String.length initial)) * 1024 * 4 in
let rec loop s i =
if i < imax then
let ns = string_replace (s ^ initial) "efgh" "____" in
let sl = initial_length * i in
if (sl mod 1024) * 256 == 0 then
Printf.printf "%f s | %d Kb |\n" (Sys.time() -. initial_time) (sl / 1024);
loop ns (i + 1)
in loop initial 0;;
test;;
I have tried to follow as closely as possible the JavaScript test in order to be able to compare the results (I did some formatting here and also to spare you the searching):
function test() {
var str = "abcdefgh" + "efghefgh",
imax = 1024 / str.length * 1024 * 4,
time = new Date(), gstr = "", i = 0, lngth;
console.log("exec.tm.sec\tstr.length");
while (i++ < imax + 1000) {
gstr += str;
gstr = gstr.replace(/efgh/g, "____");
lngth = str.length * i;
if (lngth % 1024 * 256 === 0) {
var curdate = new Date();
console.log((curdate.getTime() - time.getTime()) / 1000 +
"sec\t\t" + lngth / 1024 + "kb");
}
}
}
Here are the results I'm getting...
| JavaScript | OCaml | String size |
|------------+--------------+-------------|
| 0.78 s | 14.576784 s | 256 Kb |
| 3.266 s | 58.468111 s | 512 Kb |
| 7.618 s | 132.954788 s | 768 Kb |
| 13.849 s | 238.793697 s | 1024 Kb |
| 46.243 s | 379.175356 s | 1280 Kb |
| 86.046 s | 550.838260 s | 1536 Kb |
So, my question: is there anything particularly wrong with my string replacement function? Or is this speed to be expected? I've compiled the code with ocamlopt
.
Here's my another try:
let string_index_of where what =
let length = (String.length where - 1) in
let pattern = (1 lsl (String.length what + 1)) - 1 in
let rec loop i j mask =
if i == length + 1 then (-1)
else if mask == pattern then i - j
else if (where.[i]) == (what.[j]) then
loop (i + 1) (j + 1) ((mask lsl 1) lor 1)
else loop (i + 1 - j) 0 1
in loop 0 0 1;;
let test =
let initial = "abcdefgh" ^ "efghefgh" in
let replacement = "____" in
let replacement_length = String.length replacement in
let initial_time = Sys.time() in
let initial_length = String.length initial in
let imax = (1024 / (String.length initial)) * 1024 * 4 in
let rec loop s i =
if i < imax then
let concatenated = s ^ "efgh" in
let sl = initial_length * i in
let rec replace_loop st =
let index = string_index_of st "efgh" in
if index >= 0 then
((String.blit replacement 0 st index replacement_length);
replace_loop st)
in replace_loop concatenated;
if sl mod (1024 * 256) == 0 then
Printf.printf "%f s | %d Kb |\n" (Sys.time() -. initial_time) (sl / 1024);
loop concatenated (i + 1)
in loop initial 0;;
test;;
I've also made a larger table (including some other languages for comparison):
| JavaScript V8 | ocamlopt | ocamlopt bitup | SBCL | C gcc 4 -O3 | String size |
|---------------+--------------+----------------+-----------+-------------+-------------|
| 0.78 s | 14.576784 s | 4.615298 s | 17.463 s | 1 s | 256 Kb |
| 3.266 s | 58.468111 s | 18.474191 s | 68.484 s | 4 s | 512 Kb |
| 7.618 s | 132.954788 s | 41.673664 s | 153.99 s | 10 s | 768 Kb |
| 13.849 s | 238.793697 s | 74.652651 s | 275.204 s | 17 s | 1024 Kb |
| 46.243 s | 379.175356 s | 116.517287 s | 431.768 s | 27 s | 1280 Kb |
| 86.046 s | 550.838260 s | 166.662663 s | 624.559 s | 38 s | 1536 Kb |
Still I'd hope to do better then this.
Upvotes: 3
Views: 291
Reputation: 3077
I find it some difficult to match Javascript performance, most of the engines (js) use ropes as internal data structure for strings (https://en.wikipedia.org/wiki/Rope_%28data_structure%29) .
So each time you do a substring, concat or any function that should create a new string (or are internally called), ocaml constructs a new string while javascript uses ropes to reference the previous one.
If you want to see it, check everytime create is used: (https://github.com/ocaml/ocaml/blob/trunk/stdlib/string.ml). I wanted to show you how V8 optimizes it reusing string chunks, but couldn't find it easily, so will let it as an exercise for you.
Hope it helped understand the difference and why javscript still performs better than C when it doesn't have to handle much ropes.
Upvotes: 2
Reputation: 6379
Your string search has a bad complexity, you should try a more efficent algorithm for string search (here is list, I recommend KMP)
It seems that you are new to OCaml, so here are some differences with the language you are used to:
=
and not ==
which tests for physical equality (for int
there is no difference but it will cause you some trouble on other types).(
and )
, not [
and ]
which are for lists, so [shift, i]
is a list which contains one element which is a pair. And I guess Ocaml complains about this non exaustive pattern matching.String.get s i
can be written s.[i]
.Upvotes: 3