Nick M
Nick M

Reputation: 239

Need advice on how to speed up web scraper

I am still pretty new to this. I am trying to pull data from web pages, but this method I have implemented seems a bit slow. I used the time module to narrow down the cause of the lag.

requests.get(url)

took the majority of the time (1-5 seconds)

soup = BeautifulSoup(data.content)

took about 0.15 seconds consistently

Is requests always this slow? Is the problem that yahoo rate-limits requests from their servers? Right now it is taking around 2-5 seconds to pull a single stock price from yahoo.com/finance and requests is the major problem, any ideas?

Upvotes: 2

Views: 2356

Answers (2)

nivix zixer
nivix zixer

Reputation: 1651

Just adding to what Meghdeep said..

If you are pulling many urls in sequential order you should try rewriting your code to be asynchronous. The time it takes to scrape one page will not change, but asynchronously you could scrape many pages at once. (You can use the Python Twisted or Tornado frameworks for this..or you can rewrite your scraper with Node.js)

Upvotes: 3

Meghdeep Ray
Meghdeep Ray

Reputation: 5537

The issue is not with Reqeusts. If it's slow it might be an issue with your net connection. It might also be that yahoo rate-limits requests as you rightly pointed out. Each website has a robots.txt file that details their policies regarding web scrapers and automated access to them. It shouldn't take so long regardless but I would put it down to an internet speed problem. Try to access the URL from your browser and check how long it takes to load up.

A GET request is what you send a website when you want to "GET" a webpage from them. The same GET is used when you enter the URL into your browser and hit enter. So unless there is a marked difference between the time it takes Requests to get the page and the time it takes your browser to get the page it's a problem with the internet connection speed itself.

Upvotes: 3

Related Questions