Reputation: 343
I'm currently writing a crawler in java, and I'm stuck by something.
In my crawler, I have threads downloading a static distant page, using HttpURLConnection
.
I tried to download one small file (2kb) with different parameters. The connection has a timeout set to 1s.
I noticed that, if I use 100 threads for the download, I suceed in making 3 times more request per second (~10k requests per second, which use ), whereas when using 500 threads I suceed in making "only" 4k requests per second.
I would have expected to be able to do at least as many request per second as with 100 threads.
Could you explain me why is this behaving this way, and if there is some parameter to activate somewhere to increase the maximum number of parallel connection ?
Thanks :)
Upvotes: 1
Views: 214
Reputation: 602
i think it's just a matter of your cpu, at a certain point switching threats is more expensive then the time gained by not waiting for a single connection.
i would try to maximize parralel connection by setting a upper limit
Upvotes: 1