SIM
SIM

Reputation: 22440

Python scraper shows TimeoutError and WinError in the midst of it's activity

When i run my python script i can see that it scrapes 1 or 2 pages and suddenly breaks showing [TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond]. I could notice that the website is very slow to display it's content. Anyways, i hope there is any workaround. Thanks in advance. Here is the full code:

import requests
from lxml import html

def Startpoint(mpage):
    leaf=1
    while leaf<=mpage:
        link="http://www.austrade.gov.au/"
        address = "http://www.austrade.gov.au/suppliersearch.aspx?smode=AND&ind=Agribusiness%7c%7cArts+%26+Recreation%7c%7cBuilding+%26+Construction%7c%7cBusiness+%26+Other+Services%7c%7cConsumer+Goods%2c+Non-Food%7c%7cDefence%2c+Security+%26+Safety%7c%7cEducation+%26+Training%7c%7cEnvironment+%26+Energy%7c%7cFinance+%26+Insurance%7c%7cFood+%26+Beverage%7c%7cGovernment%7c%7cHealth%2c+Biotechnology+%26+Wellbeing%7c%7cICT%7c%7cManufacturing+(Other)%7c%7cMining%7c%7cTourism+%26+Hospitality%7c%7cTransport&folderid=1736&pg=" + str(leaf)
        try : 
            page = requests.get(address, timeout=30)
        except requests.exceptions.ReadTimeout: 
            print('timed out')
            continue
        page = requests.get(address)
        tree = html.fromstring(page.text)
        titles=tree.xpath('//a[@class="Name"]')
        for title in titles:
            href = link + title.xpath('./@href')[0]
            Endpoint(href)
        leaf+=1

def Endpoint(address):
    try : 
        page = requests.get(address, timeout=30)
    except requests.exceptions.ReadTimeout: 
        print('timed out')
    else : 
        tree=html.fromstring(page.text)
        titles = tree.xpath('//div[@class="contact-details block dark"]')
        for title in titles:
            try :
                Name=title.xpath('.//p[1]/text()')[0] if len(title.xpath('.//p[1]/text()'))>0 else None
                Name1=title.xpath('.//p[3]/text()')[0] if len(title.xpath('.//p[3]/text()'))>0 else None
                Metco=(Name,Name1)
                print(Metco)
            except:
                continue

Startpoint(10)

Upvotes: 0

Views: 597

Answers (1)

t.m.adam
t.m.adam

Reputation: 15376

You could catch the timeout exception and continue the execution of your script

try : 
    page = requests.get(address, timeout=30)  # set the max timeout , eg 30 sec # 
except requests.exceptions.ReadTimeout : 
    print('timed out')
except Exception as ex : 
    print(type(ex).__name__)

Upvotes: 2

Related Questions