David542
David542

Reputation: 110113

Using curl vs Python requests

When doing a scrape of a site, which would be preferable: using curl, or using Python's requests library?

I originally planned to use requests and explicitly specify a user agent. However, when I use this I often get an "HTTP 429 too many requests" error, whereas with curl, it seems to avoid that.

I need to update metadata information on 10,000 titles, and I need a way to pull down the information for each of the titles in a parallelized fashion.

What are the pros and cons of using each for pulling down information?

Upvotes: 4

Views: 3011

Answers (3)

vonbrand
vonbrand

Reputation: 11791

I'd go for the in-language version over an external program any day, because it's less hassle.

Only if it turns out unworkable would I fall back to this. Always consider that people's time is infinitely more valuable than machine time. Any "performance gains" in such an application will probably be swamped by network delays anyway.

Upvotes: 0

poy
poy

Reputation: 10507

Using requests would allow you to do it programmatically, which should result in a cleaner product.

If you use curl, you're doing os.system calls which are slower.

Upvotes: 2

Ian Stapleton Cordasco
Ian Stapleton Cordasco

Reputation: 28747

Since you want to parallelize the requests, you should use requests with grequests (if you're using gevent, or erequests if you're using eventlet). You may have to throttle how quickly you hit the website though since they may do some ratelimiting and be refusing you for requesting too much in too short a period of time.

Upvotes: 3

Related Questions