Reputation: 63
What improvements can i make to the following lines of code to make it run faster and give me the results i want. The code takes too long to run. I have tried running this code on my computer but instead, i have received a time out error.This code loops through 3564 pages. How can i improve it to get rid of the time out error? The code only runs for pages within a small range.
import pandas as pd
from bs4 import BeautifulSoup,Tag
import requests
data = []
s=("https://www.cupcakemaps.com/search_results?page=")
for x in range(1,3564):
res=requests.get(s+str(x),timeout=20)
soup=BeautifulSoup(res.text,'lxml')
listings=soup.findAll(class_='grid_element')
for listing in listings:
listing_name=listing.find('span',{'class':'h3 bold inline-block rmargin member-search-full-name'})
if isinstance(listing_name,Tag):
listing_name=listing_name.text.strip()
listing_description=listing.find('p',{'class':'small member-search-description'})
if isinstance(listing_description,Tag):
listing_description=listing_description.text.strip()
listing_location=listing.find('span',{'class':'small member-search-location rmargin rpad'})
if isinstance (listing_location,Tag):
listing_location=listing_location.text.strip()
full_dict={'Title':listing_name,'Description':listing_description,'Location':listing_location}
data.append(full_dict)
df=pd.DataFrame(data)
print(df)
I expect to the code to print out a data frame with 3 columns.
Upvotes: 0
Views: 161
Reputation: 70
Have you tried assigning None to res and testing it to be None in a try-> except Timeout in a while loop?
import time
for x in range(1,3564):
res = None
while not res:
try:
res=requests.get(s+str(x),timeout=20)
except requests.exceptions.Timeout:
time.sleep(5) # wait 5 seconds and try again
soup=BeautifulSoup(res.text,'lxml')
listings=soup.findAll(class_='grid_element')
for listing in listings:
listing_name=listing.find('span',{'class':'h3 bold inline-block rmargin member-search-full-name'})
if isinstance(listing_name,Tag):
listing_name=listing_name.text.strip()
listing_description=listing.find('p',{'class':'small member-search-description'})
if isinstance(listing_description,Tag):
listing_description=listing_description.text.strip()
listing_location=listing.find('span',{'class':'small member-search-location rmargin rpad'})
if isinstance (listing_location,Tag):
listing_location=listing_location.text.strip()
full_dict={'Title':listing_name,'Description':listing_description,'Location':listing_location}
data.append(full_dict)
So we are just initiating res as a None variable, testing that it remains None and repeating the request if positive. Anytime a requests.exceptions.Timeout exception comes through, we catch it and wait 5 seconds before going back to the while loop.
If a different exception is raised by requests you can try substituting the except line by:
except requests.exceptions.RequestException:
Upvotes: 1