frank
frank

Reputation: 511

streaming multiple output lines from a single input line using Clojure data.csv

I am reading a csv file, processing the input, appending the output to the input and writing the results an output csv. Seems pretty straight-forward. I am using Clojure data.csv. However, I ran into a nuance in the output that does not fit with anything I've run into before with Clojure and I cannot figure it out. The output will contain 0 to N lines for each input, and I cannot figure out how to stream this down to the calling fn.

Here is the form that is processing the file:

(defn process-file
  [from to]
  (let [ctr (atom 0)]
    (with-open [r (io/reader from)
                w (io/writer to)]
      (some->> (csv/read-csv r)
               (map #(process-line % ctr))
               (csv/write-csv w)))))

And here is the form that processes each line (that returns 0 to N lines that each need to be written to the output csv):

(defn process-line
  [line ctr]
  (swap! ctr inc)
  (->> (apps-for-org (first line))
       (reduce #(conj %1 (add-results-to-input line %2)) [])))

Upvotes: 0

Views: 139

Answers (2)

Carcigenicate
Carcigenicate

Reputation: 45736

Honestly, I didn't fully understand your question, but from your comment, I seem to have answered it.

If you want to run csv/write-csv for each row returned, you could just map over the rows:

(some->> (csv/read-csv r)
         (map #(process-line % ctr)) 
         (mapv #(csv/write-csv w %))))

Note my use of mapv since you're running side effects. Depending on the context, if you used just map, the laziness may prevent the writes from happening.

It would be arguably more correct to use doseq however:

(let [rows (some->> (csv/read-csv r)
                    (map #(process-line % ctr)))]

  (doseq [row rows] 
    (csv/write-csv w row)))

doseq makes it clear that the intent of the iteration is to carry out side effects and not to produce a new (immutable) list.

Upvotes: 2

Taylor Wood
Taylor Wood

Reputation: 16194

I think the problem you're running into is that your map function process-line returns a collection of zero-to-many rows, so when you map over the input lines you get a collection of collections, when you just want one collection (of rows) to send to write-csv. If that's true, the fix is simply changing this line to use mapcat:

(mapcat #(process-line % ctr))

mapcat is like map, but it will "merge" the resulting collections of each process-line call into a single collection.

Upvotes: 1

Related Questions