Reputation: 47
How can I call the request function more than once if I want to scrape a website let says every one to five minutes autonomously? I was using a do-while loop but it does not wait for the code to be completed before running again, it just a skips everything.
do{
request('http://www.google.com', function(err, resp, html){
if(!err && resp.statusCode == 200){
var $ = cheerio.load(html);
url = $('b')[0].children[0].data
urls.push(url)
console.log(url);
fs.writeFile("test.txt",urls,function(err) {
if (err) {
return console.log(err)
}
console.log("The file was saved!");
})
}
});
}while (counter == 0)
Upvotes: 0
Views: 45
Reputation: 1785
To simply solve the problem you are having, you should look into setInterval instead of trying to use a modulus on the minutes
of a Date
object.
Something like:
setInterval(scrape, 1000 * 60); //1000ms = 1 second. 1 second * 60 = 1 minute
will work if you have your request logic inside of a function called scrape
.
If you want to build a more sophisticated tool you can check out the link in the other answer, otherwise this should get you unblocked.
Hope this helps!
Upvotes: 1
Reputation: 5393
Node is asynchronous. and that is why you see it skipping everything inside your loop, instead of putting your implementation inside a loop i would advice you to check some very well known node modules which are designed to step over those implemenations and makes it easy on you to design your code in a very nice "async" fashion, like async, or Q if you are a fan of Javascript promises instead of callbacks.
Further more, if you wish to scrap the web, there are a plenty of scraping modules which might be useful for your situation.
Upvotes: 0