Arthur Camara
Arthur Camara

Reputation: 577

Clojure idiomatic and memory-efficient loop

I'm implementing a simple algorithm in Clojure that insists on blowing the memory, even with :jvm-opts ["-Xmx4G"] set on my project.clj. assuming the following data structures:

(def inf Double/POSITIVE_INFINITY)
(def min-dist (atom {:1 {:1 0 :2 4} :2 {:1 4 :2 0 :3 5} :3 {:2 5 :3 0}}))
(def vertexes [:1 :2 :3])

The following will run out of memory on bigger inputs (|vertexes| = 100):

(for [k vertexes i vertexes j vertexes]
  (do
    (println " " i " " j " "k)
    (let [s (+ (get-in @min-dist [i k] inf) (get-in @min-dist [k j] inf))]
      (if (> (get-in @min-dist [i j] inf) s) (swap! min-dist assoc-in [i j] s)))))

output:

OutOfMemoryError Java heap space  java.util.Arrays.copyOf (Arrays.java:2367)

I'm pretty sure that that is a reduce option here that will make everything clean and fast, but I just can't find it. It looks like swap!takes a lot of memory space, am I right?


Two bonus questions:

  1. if I remove the println line (as well as the do, of course), the code will run fast, but min-dist will not be updated, as if the loops were not executed. Why is it?

  2. The same behavior is seen when running with lein run, even with the println in there. Why?

Any help to a new Clojurist will be appreciated. =)

Upvotes: 2

Views: 349

Answers (1)

Arthur Ulfeldt
Arthur Ulfeldt

Reputation: 91567

your sub question #1 is the key.

for produces a lazy list so no work is done unless the result is actually read. if you want the result to be evaluated you could wrap the whole thing in a call to dorun which traverses the list without keeping the whole thing in memory. I refer to this situation in general as "being bitten by the lazy bug" and it happens to most Clojurians more often than they let on ;-)

user> @min-dist
{:1 {:1 0, :2 4}, :2 {:1 4, :2 0, :3 5}, :3 {:2 5, :3 0}}
user> (time
        (dorun (for [k vertexes i vertexes j vertexes]
                 (let [s (+ (get-in @min-dist [i k] inf) (get-in @min-dist [k j] inf))]
                  (if (> (get-in @min-dist [i j] inf) s) (swap! min-dist assoc-in [i j] s))))))
"Elapsed time: 4.272336 msecs"
nil
user> @min-dist
{:1 {:1 0, :2 4, :3 9}, :2 {:1 4, :2 0, :3 5}, :3 {:2 5, :3 0, :1 9}}

removing the println is a good idea because with a list of 100 vertexes that for expression (it's not a loop in the sense that word has in other languages) is going to run 1 million times (100 * 100 * 100), so printing it out will take a while.

and because i'm a sucker for bonus points: here it is using reduce:

user> (def min-dist {:1 {:1 0 :2 4} :2 {:1 4 :2 0 :3 5} :3 {:2 5 :3 0}})
#'user/min-dist
user> (time
       (->> (for [k vertexes i vertexes j vertexes]     ;; start by making a sequence of all the combinatioins of indexes
              [i j k])
            (reduce
             (fn [result [i j k]]                       ;; the reducer function takes the result so far and either modifies it 
               (let [s (+ (get-in result [i k] inf)     ;; or passes it through unchanged.
                          (get-in result [k j] inf))]
                 (if (> (get-in result [i j] inf) s)    ;; depending on this if expression here.
                   (assoc-in result [i j] s)            ;; pass on the changed value
                   result)))                            ;; pass on the original value, unchanged
             min-dist)))                                ;; this argument is the initial value.
                                                        ;; the ->> expression places the result of the for expression here. (as the last argument)
"Elapsed time: 5.099 msecs"
{:1 {:1 0, :2 4, :3 9}, :2 {:1 4, :2 0, :3 5}, :3 {:2 5, :3 0, :1 9}}

This leaves the original value in min-dist intact.

Upvotes: 10

Related Questions