conol
conol

Reputation: 425

How to skip page if request response is 404 or 505

I wrote a scraper in python. Unfortunately, when the scraper encounters a 404 or 505 page it stops working. How can I skip those pages in my loop in order to avoid this problem?

Here is my code:

import requests
from bs4 import BeautifulSoup
import time
c = int(40622)
a = 10
for a in range(10):
    url = 'https://example.com/rockery/'+str(c)
    c = int(c) + 1
    print('-------------------------------------------------------------------------------------')
    print(url)
    print(c)
    time.sleep(5)
    response = requests.get(url)
    html = response.content
    soup = BeautifulSoup(html, "html.parser")
    name = soup.find('a', attrs={'class': 'name-hyperlink'})
    name_final = name.text

    name_details = soup.find('div', attrs={'class': 'post-text'})
    name_details_final = name_details.text
    
    name_taglist = soup.find('div', attrs={'class': 'post-taglist'})
    name_taglist_final = name_taglist.text

    name_accepted_tmp = soup.find('div', attrs={'class': 'accepted-name'})
    name_accepted = name_accepted_tmp.find('div', attrs={'class': 'post-text'})
    name_accepted_final = name_accepted.text

    print('q_title=',name_final,'\nq_details=',name_details,'\nq_answer=',name_accepted)
    print('-------------------------------------------------------------------------------------')

Here is the error I encounter when I hit a 404 or 505 page:

error

Traceback (most recent call last):

File "scrab.py", line 18, in

name_final = name.text

AttributeError: 'NoneType' object has no attribute 'text'

Upvotes: 1

Views: 3931

Answers (1)

TheoretiCAL
TheoretiCAL

Reputation: 20571

Check the status code of the response, if it is not 200 (ok) you can skip it by going to the next iteration in your loop with a continue statement:

response = requests.get(url)
if response.status_code != 200: #could also check == requests.codes.ok
   continue

Upvotes: 7

Related Questions