Reputation: 502
I am a freshman at CMU who is completely lost in his first term project, and I would really appreciate your help :)
I am writing a scraping tool, and sometimes a request just does not respond. It doesn't return anything; it does not even return an error. This problem makes my scraper get stuck on one URL instead of recognizing that it is stuck and moving on. Here is the code:
def extractHTML(url):
startTime = time.time()
headers = requests.utils.default_headers()
headers.update(
{'User-Agent':
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0',})
paper1Link = requests.get(url,headers=headers)
papaer1Content=BeautifulSoup(paper1Link.content,"lxml")
return str(papaer1Content)
Upvotes: 1
Views: 178
Reputation: 15310
The requests
documentation has a section called "Timeouts". Perhaps you should start there.
paper1Link = requests.get(url,headers=headers, timeout=0.4)
Upvotes: 2