Reputation: 547
I'm doing data analysis in Clojure. During the course of this data analysis I'm interacting with a SQlite file with about three hundred megabytes of data in it. Some queries on this dataset like
(select crawls)
return really long lists of crawling information. However, other queries that target larger columns are giving me:
OutOfMemoryError Java heap space org.sqlite.NativeDB.column_text (NativeDB.java:-2)
I can get this with korma with something as simple as:
(select authors)
As will:
=> (first (select stories))
OutOfMemoryError Java heap space org.sqlite.NativeDB.column_text (NativeDB.java:-2)
Does anyone know how I can go about fixing this issue? This is my first large data analysis project.
Upvotes: 2
Views: 544
Reputation: 20194
The queries should be returning a lazy result, so that you can work with the result without needing to have the whole thing in the heap. If you do a large select from the repl, the implicit print realizes the whole lazy seq all in one go, thus running out of heap.
The correct way to handle this is to write a reduce
(or maybe a map
/ doseq
combo) that will work with one result at a time without holding onto older elements. Watch out for "holding onto the head", that is, don't bind the top element of the lazy seq, or else the whole thing will be held in the heap.
Upvotes: 4