Xullnn
Xullnn

Reputation: 405

After loading a url by open-uri, how to handle the generated Tempfile object?

I wanna figure out how to download images from internet then store them locally. Here's what I did:

require 'open-uri' # => true

file = open "https://s3-ap-southeast-1.amazonaws.com/xxx/Snip20180323_40.png"
# => #<Tempfile:/var/folders/k0/.../T/open-uri20180524-60756-1r44uix>

Then I was confused about this Tempfile object. I found I can get the original url by:

file.base_uri
# => #<URI::HTTPS https://s3-ap-southeast-1.amazonaws.com/xxx/Snip20180323_40.png>

But I failed in finding a method that can directly get the original file name Snip20180323_40.png.

  1. Is there a method that can directly get the original file name from a Tempfile object?
  2. What purpose are Tempfile objects mainly used for? Are they different from normal file objects such as: file_object = File.open('how_old.rb') # => #<File:how_old.rb>?
  3. Can I convert a Tempfile object to a File object?
  4. How can I write this Tempfile as the same name file in a local directory, for example /users/user_name/images/Snip20180323_40.png?

Upvotes: 1

Views: 1680

Answers (1)

Chris Heald
Chris Heald

Reputation: 62648

  1. The original filename is only really available in the URL. Just take uri.path.split("/").last.
  2. Tempfiles are effective Files, with the distinction that when it is garbage collected, the underlying file is deleted.
  3. You can copy the underlying file with FileUtils.copy, or you can open the Tempfile, read it, and write it into a new File handle of your choosing.
  4. Something like this should work:

    def download_url_to(url, base_path)
      uri = URI(url)
      filename = uri.path.split("/").last
      new_file = File.join(base_path, filename)
      response = uri.open
      open(new_file, "wb") {|fp| fp.puts response.read }
      return new_file
    end
    

It's worth noting that if the file is less than 10kb, you'll get a StringIO object rather than a Tempfile object. The above solution handles both cases. This also just accepts whatever the last part of the path parameter is - it's going to be up to you to sanitize it, as well as the contents of the file itself; you don't want to permit clients to download arbitrary files to your system, in most cases. For example, you may want to be extra sure that the filename doesn't include paths like ..\\..\\.."which may be used to write files to non-intended locations.

Upvotes: 4

Related Questions