abo-abo
abo-abo

Reputation: 20372

How to make this Clojure code faster?

Task statement: concatenate two lists of 1e7 elements and find their sum. I'm trying to figure out an idiomatic way to write this in Clojure. And possibly also a fast non-idiomatic way, if warranted.

Here's what I got so far:

(def a (doall (vec (repeat 1e7 1))))
(def b (doall (vec (repeat 1e7 1))))
(println "Clojure:")
(time (def c (concat a b)))
(time (reduce + c))

Here's the result, using 1.9.0 with the shell command clojure -e '(load-file "clojure/concat.clj")':

Clojure:
"Elapsed time: 0.042615 msecs"
"Elapsed time: 636.798833 msecs"
20000000

There's quite a lot of room for improvement, comparing to trivial implementations in Python (156ms), Java (159ms), SBCL (120ms), and C++ using STL algorithms (60ms).

Upvotes: 0

Views: 276

Answers (1)

Alan Thompson
Alan Thompson

Reputation: 29984

I was curious about the tradeoff between just adding the numbers vs the memory allocations, so I wrote a bit of test code that uses both Clojure vectors and primitive (java) arrays. Results:

; verify we added numbers in (range 1e7) once or twice 
(sum-vec)        => 49999995000000 
(into-sum-vec)   => 99999990000000

ARRAY  power =  7 
"Elapsed time: 21.840198 msecs"       ; sum once 
"Elapsed time: 45.036781 msecs"       ; 2 sub-sums, then add sub-totals 
(timing (sum-sum-arr)) => 99999990000000 
"Elapsed time: 397.254961 msecs"      ; copy into 2x array, then sum 
(timing (sum-arr2)) => 99999990000000

VECTOR  power =  7  
"Elapsed time: 112.522111 msecs"    ; sum once from vector 
"Elapsed time: 387.757729 msecs"    ; make 2x vector, then sum

So we see that, using primitive long arrays (on my machine), we need 21 ms to sum 1e7 integers. If we do that sum twice and add the sub-totals, we get 45 ms elapsed time.

If we allocate a new array of length 2e7, copy in the first array twice, and then sum up the values, we get about 400ms which is 8x slower than the adding alone. So we see that the memory allocation & copying is by far the largest cost.

For the native Clojure vector case, we see a time of 112 ms to just sum up a preallocated vector of 1e7 integers. Combining the orig vector with itself into a 2e7 vector, then summing costs about 400ms, similar to the low-level array case. So we see that for large lists of data the memory IO cost overwhelms the details of native Java arrays vs Clojure vectors.


Code for the above (requires [tupelo "0.9.69"] ):

(ns tst.demo.core
  (:use tupelo.core tupelo.test)
  (:require [criterium.core :as crit]))

(defmacro timing [& forms]
; `(crit/quick-bench ~@forms)
  `(time ~@forms)
  )

(def power 7)
(def reps (Math/pow 10 power))

(def data-vals (range reps))
(def data-vec (vec data-vals))
(def data-arr (long-array data-vals))

; *** BEWARE of small errors causing reflection => 1000x slowdown ***
(defn sum-arr-1 []
  (areduce data-arr i accum 0
    (+ accum (aget data-arr i)))) ;      =>  6300 ms (power 6)
(defn sum-arr []
  (let [data ^longs data-arr]
    (areduce data i accum 0
      (+ accum (aget data i))))) ;       =>     8 ms (power 6)

(defn sum-sum-arr []
  (let [data   ^longs data-arr
        sum1   (areduce data i accum 0
                 (+ accum (aget data i)))
        sum2   (areduce data i accum 0
                 (+ accum (aget data i)))
        result (+ sum1 sum2)]
    result))

(defn sum-arr2 []
  (let [data   ^longs data-arr
        data2  (long-array (* 2 reps))
        >>     (dotimes [i reps] (aset data2 i (aget data i)))
        >>     (dotimes [i reps] (aset data2 (+ reps i) (aget data i)))
        result (areduce data2 i accum 0
                 (+ accum (aget data2 i)))]
    result))


(defn sum-vec      [] (reduce + data-vec))
(defn into-sum-vec [] (reduce + (into data-vec data-vec)))

(dotest
  (is= (spyx (sum-vec))
    (sum-arr))

  (is= (spyx (into-sum-vec))
    (sum-arr2)
    (sum-sum-arr))

  (newline) (println "-----------------------------------------------------------------------------")
  (println "ARRAY  power = " power)
  (timing (sum-arr))
  (spyx (timing (sum-sum-arr)))
  (spyx (timing (sum-arr2)))

  (newline) (println "-----------------------------------------------------------------------------")
  (println "VECTOR  power = " power)
  (timing (sum-vec))
  (timing (into-sum-vec))

)

You can switch from time to using Criterium by changing the comment line in the timing macro. However, Criterium is meant for short tasks and you should probably keep power to only 5 or 6.

Upvotes: 1

Related Questions