rainkinz
rainkinz

Reputation: 10394

Read each entry lazily from a zip file

I want to read file entries in a zip file into a sequence of strings if possible. Currently I'm doing something like this to print out directory names for example:

 (defn entries [zipfile]
   (lazy-seq
       (if-let [entry (.getNextEntry zipfile)]
            (cons entry (entries zipfile)))))


(defn with-each-entry [fileName f]
   (with-open [z (ZipInputStream. (FileInputStream. fileName))]
       (doseq [e (entries z)]
            ; (println (.getName e))
            (f e)
            (.closeEntry z))))

(with-each-entry  "tmp/my.zip"
  (fn [e] (if (.isDirectory e)
            (println (.getName e)))))

However this will iterate through the entire zip file. How could I change this so I could take the first few entries say something like:

(take 10 (zip-entries "tmp/my.zip"
  (fn [e] (if (.isDirectory e)
            (println (.getName e)))))

Upvotes: 1

Views: 517

Answers (1)

Magos
Magos

Reputation: 3014

This seems like a pretty natural fit for the new transducers in CLJ 1.7.
You just build up the transformations you want as a transducer using comp and the usual seq-transforming fns with no seq/collection argument. In your example cases,
(comp (map #(.getName %)) (take 10)) and
(comp (filter #(.isDirectory %)) (map #(-> % .getName println))).

This returns a function of multiple arities which you can use in a lot of ways. In this case you want to eagerly reduce it over the entries sequence (to ensure realization of the entries happens inside with-open), so you use transduce (example zip data made by zipping one of my clojure project folders):

(with-open [z (-> "training-day.zip" FileInputStream. ZipInputStream.)]
  (let[transform (comp (map #(.getName %)) (take 10))]
    (transduce transform conj (entries z))))
;;return value: [".gitignore" ".lein-failures" ".midje-grading-config.clj" ".nrepl-port" ".travis.yml" "project.clj" "README.md" "target/" "target/classes/" "target/repl-port"]

Here I'm transducing with base function conj which makes a vector of the names. If you instead want your transducer to perform side-effects and not return a value, you can do that with a base function like (constantly nil):

(with-open [z (-> "training-day.zip" FileInputStream. ZipInputStream.)]
  (let[transform (comp (filter #(.isDirectory %)) (map #(-> % .getName  println)))]
    (transduce transform (constantly nil) (entries z))))

which gives output:

target/
target/classes/
target/stale/
test/

A potential downside with this is that you'll probably have to manually incorporate .closeEntry calls into each transducer you use here to prevent holding those resources, because you can't in the general case know when each transducer is done reading the entry.

Upvotes: 2

Related Questions