Reputation: 3525
This is my code:
(defn calculate [operators operands]
(loop [index 0
equation (get operands 0)]
(if (< index (count operators))
(recur (inc index) (list (get operators index) (get operands (inc index)) equation))
(eval equation))))
(defn equation-true? [operators line]
(let [result (first line)
operands (into [] (rest line))
ops (into [] (apply comb/cartesian-product (repeat (dec (count operands)) operators)))]
(loop [index 0]
(if (< index (count ops))
(if (= result (calculate (into [] (get ops index)) operands))
true
(recur (inc index)))
false))))
(defn parse-input [input]
(->> (clojure.string/split-lines (slurp input))
(map #(clojure.string/split % #":* +"))
(map #(map parse-long %))))
(defn total-calibration-result-seq [input]
(let [operators [+ *]]
(->> (parse-input input)
(filter #(equation-true? operators %))
(map first)
(apply +))))
(defn total-calibration-result-pmap [input]
(let [operators [+ *]]
(->> (parse-input input)
(pmap (fn [line] [(equation-true? operators line) (first line)]))
(filter first)
(map second)
(apply +))))
The last two functions produce the same result; but the -seq
versions runs sequential, whereas the -pmap
version uses, well, pmap and thus is ought to run in parallel. However, the parallel version takes significantly more time to execute. I would like to know why.
Monitoring cpu usage with top
in a separate terminal window, it might give a clue why: The sequential version sometimes exceedes 100% cpu usage, but maybe that's due to the JVM's garbage collection.
The parallel version only uses up to 200%, most of the time significantly less though. (My cpu has 4 cores with 2 threads each. top lists 8 cpus.) Whats more interesting though is the load on each cpu: it stays at about 20% most of the time; sometimes individual cpus go idle; the individual load is never exceeding 40%.
Why does the parallel version not create a load approaching 100% on each cpu?
I've tried to create a more minimal example, but then pmap behaved as expected. The problem must therefore be with this code.
Upvotes: 0
Views: 60
Reputation: 29958
Whenever I want to parallelize something in Clojure, I find that the Claypoole library is the best way to go.
Upvotes: 1
Reputation: 91827
pmap
's parallelization plan is not very sophisticated. It does some stuff that could, in ideal circumstances, keep N+2 threads busy, where N is the number of cores it thinks you have: (+ 2 (.. Runtime getRuntime availableProcessors))
. But circumstances are rarely ideal, and it doesn't strictly control the number of threads, so it's certainly possible that only one thread is actually working at times.
That's one possible explanation, but there are many more. Your task looks to me (and I assume you) like it takes quite a bit of CPU time to execute, but maybe it doesn't, in which case the coordination overhead of creating and managing a thread per line could be larger than the parallelization speedup. Or it might take you longer to create a new task (parse the input) than it does to solve that task, in which case you're still basically doing tasks serially.
Or I'm sure there are many other reasons you might not be producing an ideal situation for pmap
. Ultimately pmap
is more of a toy and a marketing tool ("Look how easy it is to parallelize things!") than a real parallelization tool. If you want real efficient results, you have to do more work, e.g. by using java.util.concurrent
's tools more directly.
Upvotes: 1