djiso1
djiso1

Reputation: 5

Retry requests mechanism

Im trying to build web scraper project one of the thing im trying to do is smart retry mechanism using urlib3 and requests and beautiful soup

when im set the timeout=1 in order to fail the retry and check retry its break with exception code below :

import requests
import re
from bs4 import BeautifulSoup
import json
import time
import sys
from requests.adapters import HTTPAdapter
from urllib3.util import Retry

# this get_items methods is for getting dict of link to scrape items per link

def get_items(self, dict):
        itemdict = {}
        for k, v in dict.items():
            boolean = True
        # here, we fetch the content from the url, using the requests library
            while (boolean):
             try:
                a =requests.Session()
                retries = Retry(total=3, backoff_factor=0.1, status_forcelist=[301,500, 502, 503, 504])
                a.mount(('https://'), HTTPAdapter(max_retries=retries))
                page_response = a.get('https://www.XXXXXXX.il' + v, timeout=1)
             except requests.exceptions.Timeout:
                print  ("Timeout occurred")
                logging.basicConfig(level=logging.DEBUG)
             else:
                 boolean = False

            # we use the html parser to parse the url content and store it in a variable.
            page_content = BeautifulSoup(page_response.content, "html.parser")
            for i in page_content.find_all('div', attrs={'class':'prodPrice'}):
                parent = i.parent.parent.contents[0]
                getparentfunc= parent.find("a", attrs={"href": "javascript:void(0)"})
                itemid = re.search(".*'(\d+)'.*", getparentfunc.attrs['onclick']).groups()[0]
                itemName = re.sub(r'\W+', ' ', i.parent.contents[0].text)
                priceitem = re.sub(r'[\D.]+ ', ' ', i.text)
                itemdict[itemid] = [itemName, priceitem]

ill be appreciate for efficiency retry mechanism resolve or any other simple method Thanks Iso

Upvotes: 0

Views: 893

Answers (1)

pguardiario
pguardiario

Reputation: 55002

I usually do something like:

def get(url, retries=3):
    try:
        r = requests.get(url)
        return r
    except ValueError as err:
        print(err)
        if retries < 1:
            raise ValueError('No more retries!')
        return get(href, retries - 1)

Upvotes: 1

Related Questions