Reputation: 10395
(->> "/Users/micahsmith/printio/gooten-import-ai/jupyter/data"
File.
file-seq
(filter #(-> ^File % .getAbsolutePath (str-contains? ".json")))
(mapcat (fn [^File file]
(with-open [ rdr (io/reader file)]
(line-seq rdr)))))
I'm trying to read a directory of json files line-by-line, lazily, so that i can perform an operation lazily on the data.
I keep getting java.io.IOException: Stream closed
-- how can i consume this without closing the reader too early?
Upvotes: 1
Views: 488
Reputation: 91857
The with-open
function is designed to discourage you from doing this, because file handles and other operating system resources are the sort of thing you should handle carefully instead of lazily. You are intended to do all processing of the file contents within the dynamic scope of your with-open
. So, instead of returning a lazy sequence, you should accept a function as an argument, and call that function on the lazy sequence while still within the scope of with-open
. That function should of course not return another lazy sequence, but instead process its entire input before returning.
So the typical use for such a thing is like this:
(defn process-file [filename process]
(with-open [f (io/reader filename)]
(process (line-seq f))))
It's a little more complicated when you have a list of with-open
sequences - you can't just call process
once. One thing you could do is return a list of the results of calling process
on each file:
(defn process-files [filenames process]
(for [filename filenames]
(with-open [f (io/reader filename)]
(process (line-seq f)))))
Then if you need to do some global operation on that, you can reduce
over the result of process-files
.
Upvotes: 2
Reputation: 45736
The problem is with-open
calls .close
when the program exits the scope it's enclosing, but all the lines haven't necessarily been read by that point.
My solution is probably an abusive abomination that should never have seen the light of day, but here's the idea: create a "lazy-seq
" that just calls .close
, and concatenate it to the end of the line-seq
list:
(defn lazy-lines [^File file]
(let [rdr (io/reader file)]
(lazy-cat (line-seq rdr)
(do (.close rdr)
nil)))) ; Explicit nil to indicate termination
(defn get-lines [^String path]
(->> path
(File.)
(file-seq)
(filter #(-> ^File % (.getAbsolutePath) (clojure.string/includes? ".json")))
(mapcat lazy-lines)))
From my quick testing with files on my Desktop, it appears to work. If you add a println
into the terminating lazy-seq
, it prints as expected, so the file is being closed.
I'm hesitant to suggest this solution though as it relies on carrying out side effects inside of a lazy-list, which I've been conditioned to "feel wrong" for obvious reasons. The major downside of this method is that the file won't be closed unless the entire sequence is evaluated, and the file will stay open the entire time until the end is reached. Given the constraints though, I don't see how either of these problems could be avoided.
I realized I was using lazy-cat
slightly wrong. I had an extra, unnecessary lazy-seq
wrapper. It's fixed now. You could also just use something like
(apply concat (line-seq rdr)
(lazy-seq (do (.close rdr)
nil))))))
Instead of lazy-cat
.
Upvotes: 1