user1402671
user1402671

Reputation:

Streaming Download while File is Created

I was wondering if anyone knows how to stream a file download while its being created at the same time.

I'm generating a huge CSV export and as of right now it takes a couple minutes for the file to be created. Once its created the browser then downloads the file.

I want to change this so that the browser starts downloading the file while its being created. Looking at this progress bar users will be more willing to wait. Even though it would tell me there an “Unknown time remaining” I’m less likely to get impatient since I know data is being steadily downloaded.

NOTE: Im using Rails version 3.0.9

Here is my code:

def users_export
  File.new("users_export.csv", "w")                 # creates new file to write to
  @todays_date = Time.now.strftime("%m-%d-%Y")
  @outfile = @todays_date + ".csv"

  @users = User.select('id, login, email, last_login, created_at, updated_at')

  FasterCSV.open("users_export.csv", "w+") do |csv|
    csv << [ @todays_date ]

    csv << [ "id","login","email","last_login", "created_at", "updated_at" ]
    @users.find_each(:batch_size => 100 ) do |u|
      csv << [ u.id, u.login, u.email, u.last_login, u.created_at, u.updated_at ]
    end
  end

  send_file "users_export.csv",
    :type => 'text/csv; charset=iso-8859-1; header=present',
    :disposition => "attachment; filename=#{@outfile}",
    :stream => true,
end

Upvotes: 2

Views: 1454

Answers (1)

patrickmcgraw
patrickmcgraw

Reputation: 2495

I sought an answer to this question several weeks ago. I thought that if data was being streamed back to the client then maybe Heroku wouldn't time out one of my long running API calls after 30 seconds. I even found an answer that looked promising:

format.xml do
  self.response_body =
    lambda { |response, output|
      output.write("<?xml version='1.0' encoding='UTF-8' ?>")
      output.write("<results type='array' count='#{@report.count}'>")
      @report.each do |result|
        output.write("""
          <result>
            <element-1>Data-1</element-1>
            <element-2>Data-2</element-2>
            <element-n>Data-N</element-n>
          </result>
        """)
      end
      output.write("</results>")
    }
  end

The idea being that the response_body lambda will have direct access to the output buffer going back to the client. However, in practice Rack has its own ideas about what data should be sent back and when. Furthermore this response_body as lambda pattern is deprecated in newer versions of rails and I think support is dropped outright in 3.2. You could get your hands dirty in the middleware stack and write this output as a Rails Metal but......

If I may be so bold, I strongly suggest refactoring this work to a background job. The benefits are many:

  • Your users will not have to just sit and wait for the download. They can request a file and then browse away to other more exciting portions of your site.

  • The file generation and download will be more robust, for example, if a user loses internet connectivity, even briefly, on minute three of a download under the current setup, they will lose all that time and need to start over again. If the file is being generated in the background on your site, they only need internet for as long as it takes to get the job started.

  • It will decrease the load on your front-end processes and may decrease the load on your site in total if the background job generates the files and you provide links to the generated files on a page within your app. Chances are one file generation could serve several downloads.

  • Since practically all Rails web servers are single threaded and synchronous out of the box, you will have an entire app server process tied up on this one file download for each time a user requests it. This makes it easy for users to accidentally carry out a DoS attack on your site.

  • You can ship the background generated file to a CDN such as S3 and perhaps gain a performance boost on the download speed your users see.

  • When the background process is done you can notify the user via email so they don't even have to be at the computer where they initiated the file generation in order to know it's done.

  • Once you have a background job system in your application you will find many more uses for it, such as sending email or updating search indexing.

Sorry that this doesn't really answer your original question. But I strongly believe this is a better overall solution.

Upvotes: 1

Related Questions