mityakoval
mityakoval

Reputation: 908

rails - Exporting a huge CSV file consumes all RAM in production

So my app exports a 11.5 MB CSV file and uses basically all of the RAM that never gets freed.

The data for the CSV is taken from the DB, and in the case mentioned above the whole thing is being exported.

I am using Ruby 2.4.1 standard CSV library in the following fashion:

export_helper.rb:

CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
  data = Model.scope1(param).scope2(param).includes(:model1, :model2)
  data.each do |item|
    file << [
      item.method1,
      item.method2,
      item.methid3
    ]
  end
  # repeat for other models - approx. 5 other similar loops
end

and then in the controller:

generator = ExportHelper::ReportGenerator.new
generator.full_report
respond_to do |format|
  format.csv do
    send_file(
      "#{Rails.root}/full_report.csv",
      filename: 'full_report.csv',
      type: :csv,
      disposition: :attachment
    )
  end
end

After a single request the puma processes load 55% of the whole server's RAM and stay like that until eventually run out of memory completely.

For instance in this article generating a million-lines 75 MB CSV file required only 1 MB of RAM. But there is no DB querying involved.

The server has 1015 MB RAM + 400 MB of swap memory.

So my questions are:

Thanks in advance!

Upvotes: 4

Views: 4318

Answers (3)

lzap
lzap

Reputation: 17174

Beware tho, you can easily improve ActiveRecord side, but then when sending response through Rails, it will all end up in memory buffer in the Response object: https://github.com/rails/rails/blob/master/actionpack/lib/action_dispatch/http/response.rb#L110

You also need to take use of live streaming feature to pass the data to the client directly without buffering: https://guides.rubyonrails.org/action_controller_overview.html#live-streaming-of-arbitrary-data

Upvotes: 0

eikes
eikes

Reputation: 5061

Instead of each you should be using find_each, which is specifically for cases like this, because it will instantiate the Models in batches and release them afterwards, whereas each will instantiate all of them at once.

CSV.open('full_report.csv', 'wb', encoding: UTF-8) do |file|
  Model.scope1(param).find_each do |item|
    file << [
      item.method1
    ]
  end
end

Furthermore you should stream the CSV instead of writing it to memory or disk before sending it to the browser:

format.csv do
  headers["Content-Type"] = "text/csv"
  headers["Content-disposition"] = "attachment; filename=\"full_report.csv\""

  # streaming_headers
  # nginx doc: Setting this to "no" will allow unbuffered responses suitable for Comet and HTTP streaming applications
  headers['X-Accel-Buffering'] = 'no'
  headers["Cache-Control"] ||= "no-cache"

  # Rack::ETag 2.2.x no longer respects 'Cache-Control'
  # https://github.com/rack/rack/commit/0371c69a0850e1b21448df96698e2926359f17fe#diff-1bc61e69628f29acd74010b83f44d041
  headers["Last-Modified"] = Time.current.httpdate

  headers.delete("Content-Length")
  response.status = 200

  header = ['Method 1', 'Method 2']
  csv_options = { col_sep: ";" }

  csv_enumerator = Enumerator.new do |y|
    y << CSV::Row.new(header, header).to_s(csv_options)
    Model.scope1(param).find_each do |item|
      y << CSV::Row.new(header, [item.method1, item.method2]).to_s(csv_options)
    end
  end

  # setting the body to an enumerator, rails will iterate this enumerator
  self.response_body = csv_enumerator
end

Upvotes: 11

R. Sierra
R. Sierra

Reputation: 1204

Apart from using find_each, you should try running the ReportGenerator code in a background job with ActiveJob. As background jobs run in seperate processes, when they are killed memory is released back to the OS.

So you could try something like this:

  • A user requests some report(CSV, PDF, Excel)
  • Some controller enqeues a ReportGeneratorJob, and a confirmation is displayed to the user
  • The job is performed and an email sent with the download link/file.

Upvotes: 4

Related Questions