Reputation: 275
I'm writing a python program for downloading some pictures of students at my school.
Here is my code: `
import os
count = 0
max_c = 1000000
while max_c >= count:
os.system("curl http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(count)+".jpg > "+str(count)+".jpg")
count=count+1
`
The problem is that i only want so save the jpg if the image exists on the server (not 404), and since i don't have all the image names on the server, i have to send a request for all images between 0 and 1000000, but not all images between 0 and 1000000 exists. So i only want so save the image if it exists on the server. How do i do this (ubuntu)?
Thank you in advance
Upvotes: 0
Views: 5940
Reputation: 447
You can use the "-f
" arg to fail silently (not printing HTTP errors), eg:
curl -f site.com/file.jpg
Upvotes: 20
Reputation: 12693
This is old, but I discovery in bash you can use --fail
and it will silent fail. If the page is an error, it will NOT download...
Upvotes: 1
Reputation: 49577
I would suggest using urllib
library provided by python for your purpose.
count = 0
max_c = 1000000
while max_c >= count:
resp = urllib.urlopen("http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(count)+".jpg")
if resp.getcode() == 404:
//do nothing
else:
// do what you got to do.
count=count+1
Upvotes: 1
Reputation: 22781
import urllib2
import sys
for i in range(1000000):
try:
pic = urllib2.urlopen("http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(i)+".jpg").read()
with open(str(i).zfill(7)+".jpg") as f:
f.write(pic)
print "SUCCESS "+str(i)
except KeyboardInterrupt:
sys.exit(1)
except urllib2.HTTPError, e:
print "ERROR("+str(e.code)+") "+str(i)
should work, a 404 throws an exception
Upvotes: 3
Reputation: 2509
The simplest way, I think, would be to use wget
instead of curl
, which will discard 404 responses automatically.
Upvotes: 0