Retry requests mechanism

Question

Im trying to build web scraper project one of the thing im trying to do is smart retry mechanism using urlib3 and requests and beautiful soup

when im set the timeout=1 in order to fail the retry and check retry its break with exception code below :

import requests
import re
from bs4 import BeautifulSoup
import json
import time
import sys
from requests.adapters import HTTPAdapter
from urllib3.util import Retry

# this get_items methods is for getting dict of link to scrape items per link

def get_items(self, dict):
        itemdict = {}
        for k, v in dict.items():
            boolean = True
        # here, we fetch the content from the url, using the requests library
            while (boolean):
             try:
                a =requests.Session()
                retries = Retry(total=3, backoff_factor=0.1, status_forcelist=[301,500, 502, 503, 504])
                a.mount(('https://'), HTTPAdapter(max_retries=retries))
                page_response = a.get('https://www.XXXXXXX.il' + v, timeout=1)
             except requests.exceptions.Timeout:
                print  ("Timeout occurred")
                logging.basicConfig(level=logging.DEBUG)
             else:
                 boolean = False

            # we use the html parser to parse the url content and store it in a variable.
            page_content = BeautifulSoup(page_response.content, "html.parser")
            for i in page_content.find_all('div', attrs={'class':'prodPrice'}):
                parent = i.parent.parent.contents[0]
                getparentfunc= parent.find("a", attrs={"href": "javascript:void(0)"})
                itemid = re.search(".*'(\d+)'.*", getparentfunc.attrs['onclick']).groups()[0]
                itemName = re.sub(r'\W+', ' ', i.parent.contents[0].text)
                priceitem = re.sub(r'[\D.]+ ', ' ', i.text)
                itemdict[itemid] = [itemName, priceitem]

ill be appreciate for efficiency retry mechanism resolve or any other simple method Thanks Iso

Retry requests mechanism

Answers (1)

Related Questions