HTTP request overload/timeout using python

Question

I have a python script running which basically requests for a 1000 urls over http and logs their response. Here is the function that downloads the url page.

def downld_url(url, output):
     print "Entered Downld_url and scraping the pdf/doc/docx file now..."
     global error
     try:
        # determine all extensions we should account for
        f = urllib2.urlopen(url)
        data = f.read()
        dlfn = urlsplit(url).path.split('.')[-1]
        print "The extension of the file is: " + str(dlfn)
        dwnladfn = ImageDestinationPath + "/" + output + "." + dlfn
        with open(dwnladfn, "wb") as code:
            code.write(data)
            code.close()
        _Save_image_to_s3(output+"."+dlfn, ImageDestinationPath + "/" +output + 
                          "." + dlfn)
        print dlfn + " file saved to S3"
        os.remove(ImageDestinationPath + "/" +output + "." + dlfn)
        print dlfn + "file removed from local folder"
        update_database(output,output+"."+dlfn, None)
        return
     except Exception as e:
        error = "download error: " + str(e)
        print "Error in downloading file: " + error
        return

Now this runs smoothly for 100-200 urls in the pipeline but after that the response starts to get very slow and ultimately the response just times out. I am, guessing this is because of the request overload. Is there some efficient way to do this without overloading requests?

HTTP request overload/timeout using python

Answers (1)

Related Questions