Reputation: 119
I have a web crawler that is working fine, but i have an issue. What I basically need is when I reach 90 requests I need to give a timeout of 5 or more minutes.
my example code:
for (let i=0; i < maxPag; i++) {
setTimeout(function(){
console.log(i);
getInfoPage(i);
}, 15000 * (i+1));
}
For example in the example above i give a timeout but is for the page that is requested, but after 90 requests, i need to have like a bigger timeout, this happens in all times that i reach in 90 requests.
How can i make a interval in the 90th request, than in 180th request, and goes on untill my for loop ends.
Upvotes: 3
Views: 197
Reputation: 327
let extraTimeOut = 0
for (let i = 0; i < maxPag; i++) {
if(i && i % 90 === 0) {
extraTimeOut += 300000
}
setTimeout(function() {
console.log(i);
getInfoPage(i);
}, 1500 * (i + 1) + extraTimeOut);
}
Upvotes: 0
Reputation: 138237
You could use one loop that waits longer at every 90th iteration:
const timer = ms => new Promise(res => setTimeout(res, ms));
(async function() {
for(let i = 0; i < maxPag; i++) {
if(i && !(i % 90)) {
await timer(5 * 60 *1000);
} else {
await timer(15000);
}
getInfoPage(i);
}
})();
Upvotes: 2
Reputation: 28137
The easiest, cleanest and recommended solution is to use a recursive function:
var i = 0;
var BATCH_SIZE = 90;
var TIMEOUT_NORMAL = 15000;
var TIMEOUT_LONG = 300000;
function crawl() {
// Your crawling code
console.log(i);
getInfoPage(i);
// Schedule next call (if there are still pages left)
if (++i < maxPag) {
// Use the longer timeout if "i" is divisible by 90
setTimeout(crawl, i % BATCH_SIZE ? TIMEOUT_NORMAL : TIMEOUT_LONG);
}
}
// Start the crawler
crawl();
Using recursive functions for timeout gives you the most control of the loop and also makes the code easier to understand.
I used the index i
as a global variable in order to simplify the setTimeout
call and also the code will be more efficient as you don't have to pass any parameters and save any variables on the stack. Note that it is not recommended to have globally defined variables as I did above, you should write all of them in a namespace, eg:
var crawlOpts = {
currentIndex: 0,
BATCH_SIZE: 90,
TIMEOUT_NORMAL: 15000,
TIMEOUT_LONG: 300000,
}
And access them like crawlOpts.BATCH_SIZE
or crawlOpts.currentIndex++
.
Upvotes: 2
Reputation: 664246
Apart from the async
/await
approaches, you can use the following solution with the naive setTimeout
scheduling:
for (let i=0; i < maxPag; i++) {
setTimeout(function(){
console.log(i);
getInfoPage(i);
}, 15000 * (i+1) + 300000 * Math.floor(i / 90));
// ^^^^^ ^^^^^^
// 15s between each 5m between chunks of 90
}
Upvotes: 2
Reputation: 11718
use async/await
and run function recursively:
console.clear()
const maxPag = 10000;
const timeoutAfter = 10;
const timeoutTime = 1000
const timeout = ()=>new Promise(resolve=>setTimeout(resolve, timeoutTime))
const run = async (i=0) => {
console.log(i)
getInfoPage(i);
if (i%timeoutAfter === 0){
await timeout()
}
run(++i)
}
run()
Here is jsfiddle
Upvotes: 0