Fastest way to get multiple pages from a single domain?

Question

Let's say I need to get content from 5 different websites, 100 pages from each. For instance:

example.com/?a=1, 
example.com.com/?a=2 
OR 
example.com/a.txt, 
example.com/b.txt

Up until now, I have been using curl_multi, and while this is much faster than normal curl, I'm still not completely satisfied with the speed. I was wondering if there was a faster way to get pages from a single domain (connect to the domain, then grab as much as you can!).

I do not own the domain I am trying to get content from, but I will be throttling my requests.

Markus Malkusch · Accepted Answer

It depends on the server implementation. Regarding resources it's a good idea to use one TCP connection by using one HTTP/1.1's persistent connection. But it's very likely that the server implementation will handle those requests sequentially, as HTTP wants them delivered in the same order.

So if those requests needs some server side processing time you probably go faster by parallel requests. If not I guess the overhead of several connections will be out performed by one connection. At the end you'll have to benchmark the different approaches for your resources.

I guess a mix of both methods will lead to the most time efficient result, as there are resources which are delivered instantly, and some which have some processing latency.

Fastest way to get multiple pages from a single domain?

Answers (1)

Related Questions