AT_1965
AT_1965

Reputation: 107

requests.exceptions.MissingSchema: Invalid URL (with bs4)

I am getting this error: requests.exceptions.MissingSchema: Invalid URL 'http:/1525/bg.png': No schema supplied. Perhaps you meant http://http:/1525/bg.png?

I don't really care why the error happened, I want to be able to capture any Invalid URL errors, issue a message and proceed with the rest of the code.

Below is my code, where I'm trying to use try/except for that specific error but its not working...

# load xkcd page
# save comic image on that page
# follow <previous> comic link
# repeat until last comic is reached

import webbrowser, bs4, os, requests

url = 'http://xkcd.com/1526/'
os.makedirs('xkcd', exist_ok=True)

while not url.endswith('#'): # - last page

    # download the page
    print('Dowloading page %s...' % (url))
    res = requests.get(url)
    res.raise_for_status()
    soup = bs4.BeautifulSoup(res.text, "html.parser")

    # find url of the comic image (<div id ="comic"><img src="........" 
    </div
    comicElem = soup.select('#comic img')
    if comicElem == []:
        print('Could not find any images')
    else:
       comicUrl = 'http:' + comicElem[0].get('src')

       #download the image
       print('Downloading image... %s' % (comicUrl))
       res = requests.get(comicUrl)
       try:
           res.raise_for_status()
       except requests.exceptions.MissingSchema as err:
           print(err)
           continue

        # save image to folder
        imageFile = open(os.path.join('xkcd',
        os.path.basename(comicUrl)), 'wb')
        for chunk in res.iter_content(1000000):
            imageFile.write(chunk)
        imageFile.close()

#get <previous> button url
prevLink = soup.select('a[rel="prev"]')[0]
url = 'http://xkcd.com' + prevLink.get('href')

print('Done')

What a my not doing? (I'm on python 3.5) Thanks allot in advance...

Upvotes: 0

Views: 5439

Answers (3)

wonderkid2
wonderkid2

Reputation: 4864

The reason your try/except block isn't caching the exception is that the error is happening at the line

res = requests.get(comicUrl)

Which is above the try keyword.

Keeping your code as is, and just moving the try block up one line will fix it.

Upvotes: 0

Er.Ankit H Gandhi
Er.Ankit H Gandhi

Reputation: 656

Try this, if you have this type of issue occur on use wrong URL.

Solution:

import requests

correct_url = False
url = 'Ankit Gandhi' # 'https://gmail.com'
try:
    res = requests.get(url)
    correct_url = True
except:
    print("Please enter a valid URL")
if correct_url:
    """
     Do your operation
    """
    print("Correct URL")

Hope this help full.

Upvotes: 0

danidee
danidee

Reputation: 9624

if you don't care about the error (which i see as bad programming), just use a blank except statement that catches all exceptions.

#download the image
print('Downloading image... %s' % (comicUrl))
try:
    res = requests.get(comicUrl) # moved inside the try block
    res.raise_for_status()
except:
    continue

but on the other hand if your except block isn't catching the exception then it's because the exception actually happens outside your try block, so move requests.get into the try block and the exception handling should work (that's if you still need it).

Upvotes: 1

Related Questions