url fetch gets stuck when multiple urls are passed

Question

in the following code below I am trying to first check if the URL status code and then start the relevant thread and do the same for adding it to queue,

however if urls are too many then I get TimeOut error. all code added below but just discovered another bug if I am passing a mp3 file along with some jpeg images the mp3 file downloaded of its correct size is opening as one of the image in urls passed.

_fdUtils

def getParser():
    parser = argparse.ArgumentParser(prog='FileDownloader',
        description='Utility to download files from internet')
    parser.add_argument('-v', '--verbose', default=logging.DEBUG,
        help='by default its on, pass None or False to not spit in shell')
    parser.add_argument('-st', '--saveTo', default=None, action=FullPaths,
        help='location where you want files to download to')
    parser.add_argument('-urls', nargs='*',
        help='urls of files you want to download.')
    parser.add_argument('-se', nargs='*', default=[1], help='Split each url passed to urls by the'\
        " respective split order, if a url doesn't have a split default is taken 1 ")
    return parser.parse_args()

def getResponse(url):
    return requests.head(url, allow_redirects=True, timeout=10, headers={'Accept-Encoding': 'identity'})

def isWorkingURL(url):
    response = getResponse(url)
    return response.status_code in [302, 200, 100, 204, 300]

def getUrl(url):
    """ gets the actual url to download file from.
    """
    response = getResponse(url)
    return response.headers.get('location', url)

error stack Trace:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "python/file_download.py", line 181, in run
    _grabAndWriteToDisk(self, split, url, self.__saveTo, 0, self.queue)
  File "python/file_download.py", line 70, in _grabAndWriteToDisk
    resp = requests.get(url, headers={'Range': 'bytes=%s' % irange}, stream=True)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/sessions.py", line 382, in request
    resp = self.send(prep, **send_kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/sessions.py", line 505, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/sessions.py", line 167, in resolve_redirects
    allow_redirects=False,
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/sessions.py", line 485, in send
    r = adapter.send(request, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.1.0-py2.7.egg/requests/adapters.py", line 381, in send
    raise Timeout(e)
Timeout: HTTPConnectionPool(host='ia600506.us.archive.org', port=80): Read timed out. (read timeout=

url fetch gets stuck when multiple urls are passed

Answers (1)

Related Questions