Reputation: 577
I'm implementing a simple algorithm in Clojure that insists on blowing the memory, even with :jvm-opts ["-Xmx4G"]
set on my project.clj
. assuming the following data structures:
(def inf Double/POSITIVE_INFINITY)
(def min-dist (atom {:1 {:1 0 :2 4} :2 {:1 4 :2 0 :3 5} :3 {:2 5 :3 0}}))
(def vertexes [:1 :2 :3])
The following will run out of memory on bigger inputs (|vertexes| = 100
):
(for [k vertexes i vertexes j vertexes]
(do
(println " " i " " j " "k)
(let [s (+ (get-in @min-dist [i k] inf) (get-in @min-dist [k j] inf))]
(if (> (get-in @min-dist [i j] inf) s) (swap! min-dist assoc-in [i j] s)))))
output:
OutOfMemoryError Java heap space java.util.Arrays.copyOf (Arrays.java:2367)
I'm pretty sure that that is a reduce
option here that will make everything clean and fast, but I just can't find it. It looks like swap!
takes a lot of memory space, am I right?
Two bonus questions:
if I remove the println
line (as well as the do, of course), the code will run fast, but min-dist
will not be updated, as if the loops were not executed. Why is it?
The same behavior is seen when running with lein run
, even with the println
in there. Why?
Any help to a new Clojurist will be appreciated. =)
Upvotes: 2
Views: 349
Reputation: 91567
your sub question #1 is the key.
for
produces a lazy list so no work is done unless the result is actually read. if you want the result to be evaluated you could wrap the whole thing in a call to dorun
which traverses the list without keeping the whole thing in memory. I refer to this situation in general as "being bitten by the lazy bug" and it happens to most Clojurians more often than they let on ;-)
user> @min-dist
{:1 {:1 0, :2 4}, :2 {:1 4, :2 0, :3 5}, :3 {:2 5, :3 0}}
user> (time
(dorun (for [k vertexes i vertexes j vertexes]
(let [s (+ (get-in @min-dist [i k] inf) (get-in @min-dist [k j] inf))]
(if (> (get-in @min-dist [i j] inf) s) (swap! min-dist assoc-in [i j] s))))))
"Elapsed time: 4.272336 msecs"
nil
user> @min-dist
{:1 {:1 0, :2 4, :3 9}, :2 {:1 4, :2 0, :3 5}, :3 {:2 5, :3 0, :1 9}}
removing the println is a good idea because with a list of 100 vertexes that for expression (it's not a loop in the sense that word has in other languages) is going to run 1 million times (100 * 100 * 100), so printing it out will take a while.
and because i'm a sucker for bonus points: here it is using reduce:
user> (def min-dist {:1 {:1 0 :2 4} :2 {:1 4 :2 0 :3 5} :3 {:2 5 :3 0}})
#'user/min-dist
user> (time
(->> (for [k vertexes i vertexes j vertexes] ;; start by making a sequence of all the combinatioins of indexes
[i j k])
(reduce
(fn [result [i j k]] ;; the reducer function takes the result so far and either modifies it
(let [s (+ (get-in result [i k] inf) ;; or passes it through unchanged.
(get-in result [k j] inf))]
(if (> (get-in result [i j] inf) s) ;; depending on this if expression here.
(assoc-in result [i j] s) ;; pass on the changed value
result))) ;; pass on the original value, unchanged
min-dist))) ;; this argument is the initial value.
;; the ->> expression places the result of the for expression here. (as the last argument)
"Elapsed time: 5.099 msecs"
{:1 {:1 0, :2 4, :3 9}, :2 {:1 4, :2 0, :3 5}, :3 {:2 5, :3 0, :1 9}}
This leaves the original value in min-dist intact.
Upvotes: 10