Michael Barton
Michael Barton

Reputation: 9556

How to stream a large CSV response from a compojure API so that the whole response is not held in memory at once?

I'm new to using compojure, but have been enjoying using it so far. I'm currently encountering a problem in one of my API endpoints that is generating a large CSV file from the database and then passing this as the response body.

The problem I seem to be encountering is that the whole CSV file is being kept in memory which is then causing an out of memory error in the API. What is the best way to handle and generate this, ideally as a gzipped file? Is it possible to stream the response so that a few thousand rows are returned at a time? When I return a JSON response body for the same data, there is no problem returning this.

Here is the current code I'm using to return this:

(defn complete
  "Returns metrics for each completed benchmark instance"
  [db-client response-format]
  (let [benchmarks  (completed-benchmark-metrics {} db-client)]
    (case response-format
      :json  (json-grouped-output field-mappings benchmarks)
      :csv   (csv-output benchmarks))))

(defn csv-output [data-seq]
  (let [header (map name (keys (first data-seq)))
        out    (java.io.StringWriter.)
        write  #(csv/write-csv out (list %))]
    (write header)
    (dorun (map (comp write vals) data-seq))
    (.toString out)))

The data-seq is the results returned from the database, which I think is a lazy sequence. I'm using yesql to perform the database call.

Here is my compojure resource for this API endpoint:

(defresource results-complete [db]
  :available-media-types  ["application/json" "text/csv"]
  :allowed-methods        [:get]
  :handle-ok              (fn [request]
                            (let [response-format (keyword (get-in request [:request :params :format] :json))
                                  disposition     (str "attachment; filename=\"nucleotides_benchmark_metrics." (name response-format) "\"")
                                  response        {:headers {"Content-Type" (content-types response-format)
                                                             "Content-Disposition" disposition}
                                                   :body    (results/complete db response-format)}]
                              (ring-response response))))

Upvotes: 4

Views: 1710

Answers (3)

Michael Barton
Michael Barton

Reputation: 9556

Thanks to all the suggestion that were provided in this thread, I was able to create a solution using piped-input-stream:

(defn csv-output [data-seq]
  (let [headers     (map name (keys (first data-seq)))
        rows        (map vals data-seq)
        stream-csv  (fn [out] (csv/write-csv out (cons headers rows))
                              (.flush out))]
    (piped-input-stream #(stream-csv (io/make-writer % {})))))

This differs from my solution because it does not realise the sequence using dorun and does not create a large String object either. This instead writes to a PipedInputStream connection asynchronously as described by the documentation:

Create an input stream from a function that takes an output stream as its argument. The function will be executed in a separate thread. The stream will be automatically closed after the function finishes.

Upvotes: 7

Minh Tuan Nguyen
Minh Tuan Nguyen

Reputation: 1054

I was also struggling with the streaming of large csv file. My solution was to use httpkit-channel to stream every single line of the data-seq to the client and then close the channel. My solution looks like that:

[org.httpkit.server :refer :all]

(fn handler [req]
    (with-channel req channel (let [header "your$header"
                                    data-seq ["your$seq-data"]]
                                    (doseq [line (cons header data-seq)]
                                       (send! channel
                                              {:status  200
                                              :headers {"Content-Type" "text/csv"}
                                              :body    (str line "\n")}
                                              false))
                                    (close channel))))

Upvotes: 1

l0st3d
l0st3d

Reputation: 2968

Your csv-output function completely realises the dataset and turns it into a string. To lazily stream the data, you'll need to return something other than a concrete data type like a String. This suggests ring supports returning a stream, that can be lazily realised by Jetty. The answer to this question might prove useful.

Upvotes: 3

Related Questions