Reputation: 267230
So I have a recordset (sqlalchemy) of products that I am looping, and I want to download an image and save it to a folder.
If the folder doesn't exist, I want to create it.
Also, I want to first check if the image file exists in the folder. If it does, don't download just skip that row.
/myscript.py
/images/
I want the images folder to be a folder in the same directory as my script file, wherever it may be stored.
I have so far:
q = session.query(products)
for p in q:
if p.url:
req = urllib2.Request(p.url)
try:
response = urllib2.urlopen(req)
image = response.read()
???
except URLError e:
print e
Upvotes: 4
Views: 7859
Reputation: 882421
The filename should be in response.info()['Content-Disposition']
(as a filename=something
after a semicolon in that string) -- if not (that header is missing, has no semicolon, or has no filename
part), you can use urlparse.urlsplit(p.url)
and get the os.path.basename
of the last non-blank component (or, more pragmatically but that would deeply offend purists, just p.url.split('/')[-1]
;-).
So much for the filename, call it e.g. fn
.
The directory where your script lives is sd = os.path.dirname(__file__)
.
Its images
subdirectory is therefore clearly sdsd = os.path.join(sd, 'images')
.
To check if that subdirectory exists, and make it otherwise,
if not os.path.exists(sdsd): os.makedir(sdsd)
To check if the file you want to write already exists,
if os.path.exists(os.path.join(sdsd, fn)): ...
All of this code goes where you have ???
. It's a lot, so it's clearly better to make it a function taking p.url
and response
as arguments (it can read image
on its own;-) and possibly taking __file__
as well if you want the freedom to move that function into its own separate module later (I'd recommend that!).
Of course, you need to import os
for all those os
and os.path
calls, and also import urlparse
if you decide to use the latter standard library module.
Upvotes: 1
Reputation: 49852
I think you can just use urllib.urlretrieve
here:
import errno
import os
import urllib
def require_dir(path):
try:
os.makedirs(path)
except OSError, exc:
if exc.errno != errno.EEXIST:
raise
directory = os.path.join(os.path.dirname(os.path.abspath(__file__)), "images")
require_dir(directory)
filename = os.path.join(directory, "stackoverflow.html")
if not os.path.exists(filename):
urllib.urlretrieve("http://stackoverflow.com", filename)
Upvotes: 10