Reputation: 31
I am just beginning in Clojure and have written the following code for estimating pi via monte carlo simulation. I basically want to create X threads, each of which counts the number of random points that fall in the unit circle and returns it. My main thread then sums them all up and computes Pi.
However, running all the samples in the same thread is faster than splitting the computations among several threads via futures. Why?
(defn sumFutures [workerFutures acc i]
(if (= -1 i)
acc
(recur workerFutures (+ acc @(nth workerFutures i)) (dec i))))
(defn getResults [workerFutures numSamples]
(* 4.0 (/ (sumFutures workerFutures 0 (dec (count workerFutures))) numSamples)))
(defn isInCircle []
(let [x (rand) y (rand)]
(if (<= (+ (* x x) (* y y)) 1)
1
0)))
(defn countInCircle [remaining acc]
(if (zero? remaining)
acc
(recur (dec remaining) (+ acc (isInCircle)))))
(defn getWorker [samplesPerWorker]
(future
(countInCircle samplesPerWorker 0)))
(defn addWorker [workers samplesPerWorker]
(conj workers (getWorker samplesPerWorker)))
(defn getWorkers [workers samplesPerWorker remWorkers]
(if (not (zero? remWorkers))
(recur (addWorker workers samplesPerWorker) samplesPerWorker (dec remWorkers))
(doall workers)))
(defn main [numSamples numWorkers]
(getResults (getWorkers [] (quot numSamples numWorkers) numWorkers) numSamples))
;; Run all in 1 thread
(main 1000000 1)
;; Split among 100 futures (at least 8 threads)
;; SLOWER
(main 1000000 100)
Several other notes based on debugging findings:
The correct number of futures are being created
Each future is computing the correct number of simulations
This is running on multiple threads and processor cores
Upvotes: 3
Views: 130
Reputation: 20194
I think this will be easier to work with if you have idiomatic code.
sumFutures
is effectively reimplementing Clojure's definition of +
, using recursion directly rather than Clojures optimized reduce
implementations. Here's an alternate simpler (and likely faster) definition:
(defn sum-futures
[workers]
(apply + (map deref workers)))
getResults
is now much easier to read - and I catch a place where rational division is happening - if we don't need rationals, making one operand a double will save a lot of work.
(defn get-results
[workers num-samples]
(* 4.0 (/ (sum-futures workers)
(double num-samples))))
countInCircle
can be represented more clearly using clojure's +
as well.
(defn count-in-circle
[n]
(apply + (repeatedly n isInCircle)))
getWorkers
once again does primitive recursive work instead of using Clojure's abstractions. If we use repeatedly
, we can eliminate the addWorker
and getWorker
definitions, without reducing clarity, modularity, or efficiency (in fact in a case like this where you don't need indexed lookup, and the result will be consumed sequentially, the lazy-seq version should perform better than the vector. This is also now a cantidate for a refactoring along with sum-futures
to a more efficient transducer based version).
(defn get-workers
[num-workers samples-per-worker]
(repeatedly num-workers
#(future (count-in-circle samples-per-worker))
And finally main
becomes:
(defn main
[num-samples num-workers]
(get-results (get-workers (quot num-samples num-workers)
num-workers)
num-workers))
Upvotes: 4
Reputation: 1679
What happens if you drop the number of futures in your second run to 2 (and to 8)? Might be that the time put in managing the threads and futures is greater than the time saved by splitting the work.
It's generally counter-productive to spawn more threads than the hardware can support (as the threads will have to share cores and there's a cost to this). If you don't get a speed gain with only two threads (compared to 1), there's probably something wrong somewhere else.
Upvotes: 0