Reputation: 195
I'm learning Clojure and as an exercise I wanted to write something like the unix "comm" command.
To do this, I read the contents of each file into a set, then use difference/intersection to show exclusive/common files.
After a lot of repl-time I came up with something like this for the set creation part:
(def contents (ref #{}))
(doseq [line (read-lines "/tmp/a.txt")]
(dosync (ref-set contents (conj @contents line))))
(I'm using duck-streams/read-lines to seq the contents of the file).
This is my first stab at any kind of functional programming or lisp/Clojure. For instance, I couldn't understand why, when I did a conj on the set, the set was still empty. This lead me to learning about refs.
Upvotes: 4
Views: 1195
Reputation: 72926
Clojure 1.3:
user> (require '[clojure.java [io :as io]])
nil
user> (line-seq (io/reader "foo.txt"))
("foo" "bar" "baz")
user> (into #{} (line-seq (io/reader "foo.txt")))
#{"foo" "bar" "baz"}
line-seq
gives you a lazy sequence where each item in the sequence is a line in the file.
into
dumps it all into a set. To do what you were trying to do (add each item one by one into a set), rather than doseq
and refs, you could do:
user> (reduce conj #{} (line-seq (io/reader "foo.txt")))
#{"foo" "bar" "baz"}
Note that the Unix comm
compares two sorted files, which is likely a more efficient way to compare files than doing set intersection.
Edit: Dave Ray is right, to avoid leaking open file handles it's better to do this:
user> (with-open [f (io/reader "foo.txt")]
(into #{} (line-seq f)))
#{"foo" "bar" "baz"}
Upvotes: 8
Reputation: 10789
I always read with slurp
and after that split with re-seq
due to my needs.
Upvotes: 0