Theadamlt
Theadamlt

Reputation: 275

Curl only save if not 404

I'm writing a python program for downloading some pictures of students at my school.

Here is my code: `

import os
count = 0
max_c = 1000000
while max_c >= count:
    os.system("curl http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(count)+".jpg > "+str(count)+".jpg")
    count=count+1

`

The problem is that i only want so save the jpg if the image exists on the server (not 404), and since i don't have all the image names on the server, i have to send a request for all images between 0 and 1000000, but not all images between 0 and 1000000 exists. So i only want so save the image if it exists on the server. How do i do this (ubuntu)?

Thank you in advance

Upvotes: 0

Views: 5940

Answers (5)

Lucas D'Avila
Lucas D'Avila

Reputation: 447

You can use the "-f" arg to fail silently (not printing HTTP errors), eg:

curl -f site.com/file.jpg

Upvotes: 20

Rodrigo
Rodrigo

Reputation: 12693

This is old, but I discovery in bash you can use --fail and it will silent fail. If the page is an error, it will NOT download...

Upvotes: 1

RanRag
RanRag

Reputation: 49577

I would suggest using urllib library provided by python for your purpose.

count = 0
max_c = 1000000
while max_c >= count:
    resp = urllib.urlopen("http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(count)+".jpg")
    if resp.getcode() == 404:
      //do nothing
    else:
    // do what you got to do.

   count=count+1

Upvotes: 1

sleeplessnerd
sleeplessnerd

Reputation: 22781

import urllib2
import sys

for i in range(1000000):
  try:
    pic = urllib2.urlopen("http://www.tjoernegaard.dk/Faelles/ElevFotos/"+str(i)+".jpg").read()
    with open(str(i).zfill(7)+".jpg") as f:
      f.write(pic)
    print "SUCCESS "+str(i)
  except KeyboardInterrupt:
    sys.exit(1)
  except urllib2.HTTPError, e:
    print "ERROR("+str(e.code)+") "+str(i)

should work, a 404 throws an exception

Upvotes: 3

dhwthompson
dhwthompson

Reputation: 2509

The simplest way, I think, would be to use wget instead of curl, which will discard 404 responses automatically.

Upvotes: 0

Related Questions