Reputation:
The documentation of math.combinatorics states that all functions return lazy sequences.
However if I try to run subsets with a lot of data,
(last (combinatorics/subsets (range 20)))
;OutOfMemoryError Java heap space clojure.lang.RT.cons (RT.java:559)
I get an OutOfMemory Error.
Running
(last (range))
burns CPU, but it doesn't return an error.
Clojure doesn't seem to "hold on the head" like explained in this Stack Overflow question.
Why is this happening and how I can use bigger ranges in subsets?
Update
It seems to work on some peoples computers as the comments suggest. So I will post my system configuration
I run a Mac (10.8.3) and installed Clojure (1.5.1) with Homebrew.
My Java version is:
% java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06-451-11M4406)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01-451, mixed mode)
I didn't change any of the default settings. I also reinstalled all dependencies, by deleting the ~/.m2
folder.
My projects.clj.
And the command I used was this
% lein repl
nREPL server started on port 61774
REPL-y 0.1.10
Clojure 1.5.1
=> (require 'clojure.math.combinatorics)
nil
=> (last (clojure.math.combinatorics/subsets (range 20)))
OutOfMemoryError Java heap space clojure.lang.RT.cons (RT.java:570)
or
OutOfMemoryError Java heap space clojure.math.combinatorics/index-combinations/fn--1148/step--1164 (combinatorics.clj:64)
I tested the problem on a colleague's laptop, and he had the same issue, but he was on a Mac, too.
Upvotes: 7
Views: 677
Reputation: 5017
The problem is neither with apply
, nor with concat
, nor with mapcat
.
dAni's answer, where he reimplements mapcat
, does accidentally results in fixing the problem, but the reasoning behind it is not correct. Also, his answer points to an article, where the author says "I believe the problem lies in apply". This is clearly wrong, as I am about to explain below. Finally, the issue at hand is not related to this other one, where non-lazy evaluation is indeed caused by apply
.
If you look closely, both dAni and the author of that article implement mapcat
without the use of the map
function. I will show in the next example that the issue is related to the way the map
function is implemented.
To demonstrate that the issue is not related to either apply
or concat
see the following implementation of mapcat
. It uses both concat
and apply
, still it achieves full laziness:
(defn map
([f coll]
(lazy-seq
(when-let [s (seq coll)]
(cons (f (first s)) (map f (rest s)))))))
(defn mapcat [f & colls]
(apply concat (apply map f colls)))
(defn range-announce! [x]
(do (println "Returning sequence of" x)
(range x)))
;; new fully lazy implementation prints only 5 lines
(nth (mapcat range-announce! (range)) 5)
;; clojure.core version still prints 32 lines
(nth (clojure.core/mapcat range-announce! (range)) 5)
The full laziness in the above code is achieved by reimplementing the map
function. In fact mapcat
is implemented exactly the same way as in clojure.core
, yet it works fully lazy. The above map
implementation is a bit simplified for the sake of the example, as it only supports a single parameter, but even implementing it with the whole variadic signature will work the same: full laziness. So we showed that the problem here is neither with apply
nor with concat
. Also, we showed that the real problem must be related to how the map
function is implemented in clojure.core
. Let's take a look at it:
(defn map
([f coll]
(lazy-seq
(when-let [s (seq coll)]
(if (chunked-seq? s)
(let [c (chunk-first s)
size (int (count c))
b (chunk-buffer size)]
(dotimes [i size]
(chunk-append b (f (.nth c i))))
(chunk-cons (chunk b) (map f (chunk-rest s))))
(cons (f (first s)) (map f (rest s))))))))
It can be seen that the clojure.core
implementation is exactly the same as our "simplified" version before, except for the true
branch of the if (chunked-seq? s)
expression. Essentially clojure.core/map
has a special case for handling input sequences which are chunked sequences.
Chunked sequences compromise laziness by evaluating in chunks of 32 instead of strictly one at a time. This becomes painfully evident when evaluating deeply nested chunked sequences, like in the case of subsets
. Chunked sequences were introduced in Clojure 1.1, and many core functions were upgraded to recognize and process them differently, including map
. The main purpose of introducing them was to improve performance in certain stream-processing scenarios, but arguably they make it significantly harder to reason about the laziness characteristics of a program. You can read up on chunked sequences here and here. Also check out this question here.
The real problem is that range
returns a chunked seq, and is used internally by subsets
. The fix recommended by David James patches subsets
to unchunk the sequence created by range
internally.
Upvotes: 2
Reputation: 32705
This issue has been raised on the project's ticket tracker: Clojure JIRA: OutOfMemoryError with combinatorics/subsets. There, you can find a patch by Andy Fingerhut. It worked for me. Note that the patch is different than the mapcat variation suggested by another answer.
Upvotes: 1
Reputation: 91857
You want to compute the power set of a set with 1000 elements? You know that's going to have 2^1000 elements, right? That is so large I can't even find a good way to describe how enormous it is. If you're trying to work with such a set, and you can do so lazily, your problem won't be memory: it will be computation time. Let's say you have a supercomputer with infinite memory, capable of processing a trillion items per nanosecond: that's 10^21 items processed per second, or about 10^29 items per year. Even this supercomputer will take much, much longer than the lifetime of the universe to work through the items of (subsets (range 1000))
.
So I'd say, stop worrying about the memory usage of this collection, and work on an algorithm that doesn't involve walking through sequences with more elements than there are atoms in the universe.
Upvotes: 3
Reputation: 8593
The issue is that subsets
uses mapcat
, and mapcat
is not lazy enough as it uses apply which realizes and holds some of the elements to be concatenated. See a very nice explanation here. Using the lazier mapcat version of that link in subsets should fix the issue:
(defn my-mapcat
[f coll]
(lazy-seq
(if (not-empty coll)
(concat
(f (first coll))
(my-mapcat f (rest coll))))))
(defn subsets
"All the subsets of items"
[items]
(my-mapcat (fn [n] (clojure.math.combinatorics/combinations items n))
(range (inc (count items)))))
(last (subsets (range 50))) ;; this will take hours to compute, good luck with it!
Upvotes: 3
Reputation: 7328
In the absence of command line arguments, the startup heap size parameters of a JVM are determined by various ergonomics
The defaults (JDK 6) are
initial heap size memory / 64 maximum heap size MIN(memory / 4, 1GB)
but you can force an absolute value using the -Xmx
and -Xms
args
You can find more detail here
Upvotes: 0