Joshua
Joshua

Reputation: 547

I'm running out of heap space while using Clojure, what do I do?

I'm doing data analysis in Clojure. During the course of this data analysis I'm interacting with a SQlite file with about three hundred megabytes of data in it. Some queries on this dataset like

(select crawls)

return really long lists of crawling information. However, other queries that target larger columns are giving me:

OutOfMemoryError Java heap space org.sqlite.NativeDB.column_text (NativeDB.java:-2)

I can get this with korma with something as simple as:

(select authors)

As will:

=> (first (select stories))
OutOfMemoryError Java heap space  org.sqlite.NativeDB.column_text (NativeDB.java:-2)

Does anyone know how I can go about fixing this issue? This is my first large data analysis project.

Upvotes: 2

Views: 544

Answers (1)

noisesmith
noisesmith

Reputation: 20194

The queries should be returning a lazy result, so that you can work with the result without needing to have the whole thing in the heap. If you do a large select from the repl, the implicit print realizes the whole lazy seq all in one go, thus running out of heap.

The correct way to handle this is to write a reduce (or maybe a map / doseq combo) that will work with one result at a time without holding onto older elements. Watch out for "holding onto the head", that is, don't bind the top element of the lazy seq, or else the whole thing will be held in the heap.

Upvotes: 4

Related Questions