Blankman
Blankman

Reputation: 267230

Downloading an image, want to save to folder, check if file exists

So I have a recordset (sqlalchemy) of products that I am looping, and I want to download an image and save it to a folder.

If the folder doesn't exist, I want to create it.

Also, I want to first check if the image file exists in the folder. If it does, don't download just skip that row.

/myscript.py
/images/

I want the images folder to be a folder in the same directory as my script file, wherever it may be stored.

I have so far:

q = session.query(products)

for p in q:
     if p.url:
          req = urllib2.Request(p.url)
          try:
                 response = urllib2.urlopen(req)
                 image = response.read()

                 ???
          except URLError e:
                 print e

Upvotes: 4

Views: 7859

Answers (2)

Alex Martelli
Alex Martelli

Reputation: 882421

The filename should be in response.info()['Content-Disposition'] (as a filename=something after a semicolon in that string) -- if not (that header is missing, has no semicolon, or has no filename part), you can use urlparse.urlsplit(p.url) and get the os.path.basename of the last non-blank component (or, more pragmatically but that would deeply offend purists, just p.url.split('/')[-1] ;-).

So much for the filename, call it e.g. fn.

The directory where your script lives is sd = os.path.dirname(__file__).

Its images subdirectory is therefore clearly sdsd = os.path.join(sd, 'images').

To check if that subdirectory exists, and make it otherwise,

if not os.path.exists(sdsd): os.makedir(sdsd)

To check if the file you want to write already exists,

if os.path.exists(os.path.join(sdsd, fn)): ...

All of this code goes where you have ???. It's a lot, so it's clearly better to make it a function taking p.url and response as arguments (it can read image on its own;-) and possibly taking __file__ as well if you want the freedom to move that function into its own separate module later (I'd recommend that!).

Of course, you need to import os for all those os and os.path calls, and also import urlparse if you decide to use the latter standard library module.

Upvotes: 1

Philipp
Philipp

Reputation: 49852

I think you can just use urllib.urlretrieve here:

import errno
import os
import urllib

def require_dir(path):
    try:
        os.makedirs(path)
    except OSError, exc:
        if exc.errno != errno.EEXIST:
            raise

directory = os.path.join(os.path.dirname(os.path.abspath(__file__)), "images")
require_dir(directory)
filename = os.path.join(directory, "stackoverflow.html")

if not os.path.exists(filename):
    urllib.urlretrieve("http://stackoverflow.com", filename)

Upvotes: 10

Related Questions