dwebb
dwebb

Reputation: 63

run the following code without a timeout error

What improvements can i make to the following lines of code to make it run faster and give me the results i want. The code takes too long to run. I have tried running this code on my computer but instead, i have received a time out error.This code loops through 3564 pages. How can i improve it to get rid of the time out error? The code only runs for pages within a small range.

import pandas as pd
from bs4 import BeautifulSoup,Tag
import requests
data = []
s=("https://www.cupcakemaps.com/search_results?page=")
for x in range(1,3564):
    res=requests.get(s+str(x),timeout=20)
    soup=BeautifulSoup(res.text,'lxml')
    listings=soup.findAll(class_='grid_element')
    for listing in listings:
        listing_name=listing.find('span',{'class':'h3 bold inline-block rmargin member-search-full-name'})
        if isinstance(listing_name,Tag):
            listing_name=listing_name.text.strip()
        listing_description=listing.find('p',{'class':'small member-search-description'})
        if isinstance(listing_description,Tag):
            listing_description=listing_description.text.strip()
        listing_location=listing.find('span',{'class':'small member-search-location rmargin rpad'})
        if isinstance (listing_location,Tag):
            listing_location=listing_location.text.strip()
        full_dict={'Title':listing_name,'Description':listing_description,'Location':listing_location}
        data.append(full_dict)

df=pd.DataFrame(data)
print(df)

I expect to the code to print out a data frame with 3 columns.

Upvotes: 0

Views: 161

Answers (1)

Francisco
Francisco

Reputation: 70

Have you tried assigning None to res and testing it to be None in a try-> except Timeout in a while loop?

import time

for x in range(1,3564):
    res = None

    while not res:
       try:
           res=requests.get(s+str(x),timeout=20)
       except requests.exceptions.Timeout:
           time.sleep(5) # wait 5 seconds and try again
    soup=BeautifulSoup(res.text,'lxml')
    listings=soup.findAll(class_='grid_element')
    for listing in listings:
        listing_name=listing.find('span',{'class':'h3 bold inline-block rmargin member-search-full-name'})
        if isinstance(listing_name,Tag):
            listing_name=listing_name.text.strip()
        listing_description=listing.find('p',{'class':'small member-search-description'})
        if isinstance(listing_description,Tag):
            listing_description=listing_description.text.strip()
        listing_location=listing.find('span',{'class':'small member-search-location rmargin rpad'})
        if isinstance (listing_location,Tag):
            listing_location=listing_location.text.strip()
        full_dict={'Title':listing_name,'Description':listing_description,'Location':listing_location}
        data.append(full_dict)

So we are just initiating res as a None variable, testing that it remains None and repeating the request if positive. Anytime a requests.exceptions.Timeout exception comes through, we catch it and wait 5 seconds before going back to the while loop.
If a different exception is raised by requests you can try substituting the except line by:

except requests.exceptions.RequestException:

Upvotes: 1

Related Questions