Reputation: 8854
If I perform a side-effecting/mutating operation on individual data structures specific to each member of lazy sequence using map
, do I need to (a) call doall
first, to force realization of the original sequence before performing the imperative operations, or (b) call doall
to force the side-effects to occur before I map a functional operation over the resulting sequence?
I believe that no doall
s are necessary when there are no dependencies between elements of any sequence, since map
can't apply a function to a member of a sequence until the functions from map
s that produced that sequence have been applied to the corresponding element of the earlier sequence. Thus, for each element, the functions will be applied in the proper sequence, even though one of the functions produces side effects that a later function depends on. (I know that I can't assume that any element a will have been modified before element b is, but that doesn't matter.)
Is this correct?
That's the question, and if it's sufficiently clear, then there's no need to read further. The rest describes what I'm trying to do in more detail.
My application has a sequence of defrecord structures ("agents") each of which contains some core.matrix vectors (vec1
, vec2
) and a core.matrix matrix (mat
). Suppose that for the sake of speed, I decide to (destructively, not functionally) modify the matrix.
The program performs the following three steps to each of the agents by calling map
, three times, to apply each step to each agent.
vec1
in each agent, functionally, using assoc
.mat
in each agent based on the preceding vector (i.e. the matrix will retain a different state).vec2
in each agent using assoc
based on the state of the matrix produced by step 2.For example, where persons
is a sequence, possibly lazy (EDIT: Added outer doall
s):
(doall
(->> persons
(map #(assoc % :vec1 (calc-vec1 %))) ; update vec1 from person
(map update-mat-from-vec1!) ; modify mat based on state of vec1
(map #(assoc % :vec2 (calc-vec2-from-mat %))))) ; update vec2 based on state of mat
Alternatively:
(doall
(map #(assoc % :vec2 (calc-vec2-from-mat %)) ; update vec2 based on state of mat
(map update-mat-from-vec1! ; modify mat based on state of vec1
(map #(assoc % :vec1 (calc-vec1 %)) persons)))) ; update vec1 from person
Note that no agent's state depends on the state of any other agent at any point. Do I need to add doall
s?
EDIT: Overview of answers as of 4/16/2014:
I recommend reading all of the answers given, but it may seem as if they conflict. They don't, and I thought it might be useful if I summarized the main ideas:
(1) The answer to my question is "Yes": If, at the end of the process I described, one causes the entire lazy sequence to be realized, then what is done to each element will occur according to the correct sequence of steps (1, 2, 3). There is no need to apply doall
before or after step 2, in which each element's data structure is mutated.
(2) But: This is a very bad idea; you are asking for trouble in the future. If at some point you inadvertently end up realizing all or part of the sequence at a time other than what you originally intended, it could turn out that the later steps get values from the data structure that were put there at at the wrong time--at a time other than what you expect. The step that mutates a per-element data structure won't happen until a given element of the lazy seq is realized, so if you realize it at the wrong time, you could get the wrong data in later steps. This could be the kind of bug that is very difficult to track down. (Thanks to @A.Webb for making this problem very clear.)
Upvotes: 1
Views: 609
Reputation: 51450
You don't need to add doall
between two map
operations. But unless you're working in a REPL, you do need to add doall
or dorun
to force the execution of your lazy sequence.
This is true, unless you care about the order of operations.
Let's consider the following example:
(defn f1 [x]
(print "1>" x ", ")
x)
(defn f2 [x]
(print "2>" x ", ")
x)
(defn foo [mycoll]
(->> mycoll
(map f1)
(map f2)
dorun))
By default clojure will take the first chunk of mycoll
and apply f1
to all elements of this chunk. Then it'll apply f2
to the resulting chunk.
So, if mycoll
if a list
or an ordinary lazy sequence, you'll see that f1
and f2
are applied to each element in turn:
=> (foo (list \a \b))
1> a , 2> a , 1> b , 2> b , nil
or
=> (->> (iterate inc 7) (take 2) foo)
1> 7 , 2> 7 , 1> 8 , 2> 8 , nil
But if mycoll
is a vector
or chunked lazy sequence, you'll see quite a different thing:
=> (foo [\a \b])
1> a , 1> b , 2> a , 2> b , nil
Try
=> (foo (range 50))
and you'll see that it processes elements in chunks by 32 elements.
So, be careful using lazy calculations with side effects!
Here are some hints for you:
Always end you command with doall
or dorun
to force the calculation.
Use doall
and comp
to control the order of calculations, e.g.:
(->> [\a \b]
; apply both f1 and f2 before moving to the next element
(map (comp f2 f1))
dorun)
(->> (list \a \b)
(map f1)
; process the whole sequence before applying f2
doall
(map f2)
dorun)
Upvotes: 1
Reputation: 91534
You do not need to add any calls to doall
provided you do something with the results later in your program. For instance if you ran the above maps, and did nothing with the result then none of the elements will be realized. On the other hand, if you read through the resulting sequence, to print it for instance, then each of your computations will happen in order on each element sequentially. That is steps 1, 2, and 3 will happen to the first thing in the input sequence, then steps 1, 2, and 3 will happen to the second and so forth. There is no need to pre-realize sequences to ensure the values are available, lazy evaluation will take care of that.
Upvotes: 1
Reputation: 26446
Use extreme caution mixing laziness with side effects
(defrecord Foo [fizz bang])
(def foos (map ->Foo (repeat 5 0) (map atom (repeat 5 1))))
(def foobars (map #(assoc % :fizz @(:bang %)) foos))
So will my fizz of foobars now be 1?
(:fizz (first foobars)) ;=> 1
Cool, now I'll leave foobars alone and work with my original foos...
(doseq [foo foos] (swap! (:bang foo) (constantly 42)))
Let's check on foobars
(:fizz (first foobars)) ;=> 1
(:fizz (second foobars)) ;=> 42
Whoops...
Generally, use doseq
instead of map
for your side effects or be aware of the consequences of delaying your side effects until realization.
Upvotes: 3
Reputation: 20194
map
always produces a lazy result, even for a non-lazy input. You should call doall
(or dorun
if the sequence will never be used and the mapping is only done for side effects) on the output of map
if you need to force some imperative side effect (for example use a file handle or db connection before it is closed).
user> (do (map println [0 1 2 3]) nil)
nil
user> (do (doall (map println [0 1 2 3])) nil)
0
1
2
3
nil
Upvotes: 0