Konstantin Milyutin
Konstantin Milyutin

Reputation: 12366

Understanding futures and doall in Clojure

I came across an example on futures in Clojure by example

(let [sleep-and-wait
         (map (fn [time]
           (future
             (Thread/sleep time)
             (println (str "slept " time " sec" ))))
               [4000 5000])]
     (doall (map deref sleep-and-wait))
     (println "done"))

Since map produces a lazy sequence I expect that future is not started, until we call deref on it. deref is expected to block until future returns a result. We map elements sequentially, so I expected this code to run 9 sec, but it runs in 5.

Could someone explain why?

Upvotes: 1

Views: 857

Answers (2)

noisesmith
noisesmith

Reputation: 20194

Clojure lazy-seqs make no promise to be maximally lazy.

+user=> (take 1 (for [i (range 1000)] (doto i (println " - printed"))))
(0  - printed
1  - printed
2  - printed
3  - printed
4  - printed
5  - printed
6  - printed
7  - printed
8  - printed
9  - printed
10  - printed
11  - printed
12  - printed
13  - printed
14  - printed
15  - printed
16  - printed
17  - printed
18  - printed
19  - printed
20  - printed
21  - printed
22  - printed
23  - printed
24  - printed
25  - printed
26  - printed
27  - printed
28  - printed
29  - printed
30  - printed
31  - printed
0)

when seq is called on a vector (most lazy operations implicitly call seq on their collection argument), it produces a chunked data type, which realizes batches of results at a time, instead of one by one. If you need to control the consumption of your data, you can do something that forces unchunking.

+user=> 
(defn unchunk [s]
  (when (seq s)
    (lazy-seq
      (cons (first s)
            (unchunk (rest s))))))
#'user/unchunk
+user=> (take 1 (for [i (unchunk (range 1000))] (doto i (println " - printed"))))
(0  - printed
0)

of course, the simpler option in this case is to use a type that isn't chunked

+user=> (take 1 (for [i (take 1000 (iterate inc 0))] (doto i (println " - printed"))))
(0  - printed
0)

Upvotes: 4

leetwinski
leetwinski

Reputation: 17859

This could be connected with the nature of the map function: it doesn't take the elements of mapped collection one by one, rather it does it by chunks for optimization. Here is a small example:

user> (defn trace [x]
        (println :realizing x)
        x)
#'user/trace

user> (def m (map trace (range 1000)))
#'user/m

user> (first m)
:realizing 0
:realizing 1
:realizing 2

...
:realizing 30
:realizing 31
0

so in your case when you call map, it doesn't start one future in a separate thread, rather it starts both of them, and as a result you just block until the longest-running thread finishes (and it lasts for 5 s)

Upvotes: 1

Related Questions