John Doe
John Doe

Reputation: 47

Using the request module

How can I call the request function more than once if I want to scrape a website let says every one to five minutes autonomously? I was using a do-while loop but it does not wait for the code to be completed before running again, it just a skips everything.

do{
request('http://www.google.com', function(err, resp, html){
    if(!err && resp.statusCode == 200){
        var $ = cheerio.load(html);
        url = $('b')[0].children[0].data
        urls.push(url)
        console.log(url);

        fs.writeFile("test.txt",urls,function(err) {
            if (err) {
                return console.log(err)
            }
            console.log("The file was saved!");
        })       
    }
});
}while (counter == 0)

Upvotes: 0

Views: 45

Answers (2)

Brennan
Brennan

Reputation: 1785

To simply solve the problem you are having, you should look into setInterval instead of trying to use a modulus on the minutes of a Date object.

Something like:

setInterval(scrape, 1000 * 60); //1000ms = 1 second.  1 second * 60 = 1 minute

will work if you have your request logic inside of a function called scrape.

If you want to build a more sophisticated tool you can check out the link in the other answer, otherwise this should get you unblocked.

Hope this helps!

Upvotes: 1

Ma'moon Al-Akash
Ma'moon Al-Akash

Reputation: 5393

Node is asynchronous. and that is why you see it skipping everything inside your loop, instead of putting your implementation inside a loop i would advice you to check some very well known node modules which are designed to step over those implemenations and makes it easy on you to design your code in a very nice "async" fashion, like async, or Q if you are a fan of Javascript promises instead of callbacks.

Further more, if you wish to scrap the web, there are a plenty of scraping modules which might be useful for your situation.

Upvotes: 0

Related Questions