Mars
Mars

Reputation: 8854

Need to force realization of lazy seqs before/after element-wise imperative operations?

If I perform a side-effecting/mutating operation on individual data structures specific to each member of lazy sequence using map, do I need to (a) call doall first, to force realization of the original sequence before performing the imperative operations, or (b) call doall to force the side-effects to occur before I map a functional operation over the resulting sequence?

I believe that no doalls are necessary when there are no dependencies between elements of any sequence, since map can't apply a function to a member of a sequence until the functions from maps that produced that sequence have been applied to the corresponding element of the earlier sequence. Thus, for each element, the functions will be applied in the proper sequence, even though one of the functions produces side effects that a later function depends on. (I know that I can't assume that any element a will have been modified before element b is, but that doesn't matter.)

Is this correct?

That's the question, and if it's sufficiently clear, then there's no need to read further. The rest describes what I'm trying to do in more detail.


My application has a sequence of defrecord structures ("agents") each of which contains some core.matrix vectors (vec1, vec2) and a core.matrix matrix (mat). Suppose that for the sake of speed, I decide to (destructively, not functionally) modify the matrix.

The program performs the following three steps to each of the agents by calling map, three times, to apply each step to each agent.

  1. Update a vector vec1 in each agent, functionally, using assoc.
  2. Modify a matrix mat in each agent based on the preceding vector (i.e. the matrix will retain a different state).
  3. Update a vector vec2 in each agent using assoc based on the state of the matrix produced by step 2.

For example, where persons is a sequence, possibly lazy (EDIT: Added outer doalls):

(doall
  (->> persons
    (map #(assoc % :vec1 (calc-vec1 %)))            ; update vec1 from person
    (map update-mat-from-vec1!)                     ; modify mat based on state of vec1
    (map #(assoc % :vec2 (calc-vec2-from-mat %))))) ; update vec2 based on state of mat

Alternatively:

(doall
  (map #(assoc % :vec2 (calc-vec2-from-mat %))     ; update vec2 based on state of mat
       (map update-mat-from-vec1!                  ; modify mat based on state of vec1
            (map #(assoc % :vec1 (calc-vec1 %)) persons)))) ; update vec1 from person

Note that no agent's state depends on the state of any other agent at any point. Do I need to add doalls?


EDIT: Overview of answers as of 4/16/2014:

I recommend reading all of the answers given, but it may seem as if they conflict. They don't, and I thought it might be useful if I summarized the main ideas:

(1) The answer to my question is "Yes": If, at the end of the process I described, one causes the entire lazy sequence to be realized, then what is done to each element will occur according to the correct sequence of steps (1, 2, 3). There is no need to apply doall before or after step 2, in which each element's data structure is mutated.

(2) But: This is a very bad idea; you are asking for trouble in the future. If at some point you inadvertently end up realizing all or part of the sequence at a time other than what you originally intended, it could turn out that the later steps get values from the data structure that were put there at at the wrong time--at a time other than what you expect. The step that mutates a per-element data structure won't happen until a given element of the lazy seq is realized, so if you realize it at the wrong time, you could get the wrong data in later steps. This could be the kind of bug that is very difficult to track down. (Thanks to @A.Webb for making this problem very clear.)

Upvotes: 1

Views: 609

Answers (4)

Leonid Beschastny
Leonid Beschastny

Reputation: 51450

You don't need to add doall between two map operations. But unless you're working in a REPL, you do need to add doall or dorun to force the execution of your lazy sequence.

This is true, unless you care about the order of operations.

Let's consider the following example:

(defn f1 [x]
  (print "1>" x ", ")
  x)

(defn f2 [x]
  (print "2>" x ", ")
  x)

(defn foo [mycoll]
  (->> mycoll
    (map f1)
    (map f2)
    dorun))

By default clojure will take the first chunk of mycoll and apply f1 to all elements of this chunk. Then it'll apply f2 to the resulting chunk.

So, if mycoll if a list or an ordinary lazy sequence, you'll see that f1 and f2 are applied to each element in turn:

=> (foo (list \a \b))
1> a , 2> a , 1> b , 2> b , nil

or

=> (->> (iterate inc 7) (take 2) foo)
1> 7 , 2> 7 , 1> 8 , 2> 8 , nil

But if mycoll is a vector or chunked lazy sequence, you'll see quite a different thing:

=> (foo [\a \b])
1> a , 1> b , 2> a , 2> b , nil

Try

=> (foo (range 50))

and you'll see that it processes elements in chunks by 32 elements.

So, be careful using lazy calculations with side effects!

Here are some hints for you:

Always end you command with doall or dorun to force the calculation.

Use doall and comp to control the order of calculations, e.g.:

(->> [\a \b]
  ; apply both f1 and f2 before moving to the next element
  (map (comp f2 f1))
  dorun)

(->> (list \a \b)
  (map f1)
  ; process the whole sequence before applying f2
  doall
  (map f2)
  dorun)

Upvotes: 1

Arthur Ulfeldt
Arthur Ulfeldt

Reputation: 91534

You do not need to add any calls to doall provided you do something with the results later in your program. For instance if you ran the above maps, and did nothing with the result then none of the elements will be realized. On the other hand, if you read through the resulting sequence, to print it for instance, then each of your computations will happen in order on each element sequentially. That is steps 1, 2, and 3 will happen to the first thing in the input sequence, then steps 1, 2, and 3 will happen to the second and so forth. There is no need to pre-realize sequences to ensure the values are available, lazy evaluation will take care of that.

Upvotes: 1

A. Webb
A. Webb

Reputation: 26446

Use extreme caution mixing laziness with side effects

(defrecord Foo [fizz bang])

(def foos (map ->Foo (repeat 5 0) (map atom (repeat 5 1))))

(def foobars (map #(assoc % :fizz @(:bang %)) foos))

So will my fizz of foobars now be 1?

(:fizz (first foobars)) ;=> 1

Cool, now I'll leave foobars alone and work with my original foos...

(doseq [foo foos] (swap! (:bang foo) (constantly 42)))

Let's check on foobars

(:fizz (first foobars)) ;=> 1
(:fizz (second foobars)) ;=> 42

Whoops...

Generally, use doseq instead of map for your side effects or be aware of the consequences of delaying your side effects until realization.

Upvotes: 3

noisesmith
noisesmith

Reputation: 20194

map always produces a lazy result, even for a non-lazy input. You should call doall (or dorun if the sequence will never be used and the mapping is only done for side effects) on the output of map if you need to force some imperative side effect (for example use a file handle or db connection before it is closed).

user> (do (map println [0 1 2 3]) nil)
nil
user> (do (doall (map println [0 1 2 3])) nil)
0
1
2
3
nil

Upvotes: 0

Related Questions