Oli
Oli

Reputation: 1031

How do I make HTTP requests inside a loop in NodeJS

I'm writing a command line script in Node (because I know JS and suck at Bash + I need jQuery for navigating through DOM)… right now I'm reading an input file and I iterate over each line.

How do I go about making one HTTP request (GET) per line so that I can load the resulting string with jQuery and extract the information I need from each page? I've tried using the NPM httpsync package… so I could make one blocking GET call per line of my input file but it doesn't support HTTPS and of course the service I'm hitting only supports HTTPS.

Thanks!

Upvotes: 1

Views: 7440

Answers (3)

Oli
Oli

Reputation: 1031

I was worried about making a million simultaneous requests without putting in some kind of throttling/limiting the number of concurrent connections, but it seems like Node is throttling me "out of the box" to something around 5-6 concurrent connections.

This is perfect, as it lets me keep my code a lot simpler while also fully leveraging the inherent asynchrony of Node.

Upvotes: 0

nickclaw
nickclaw

Reputation: 698

I would most likely use the async library's function eachLimit function. That will allow you to throttle the number of active connections as well as getting a callback for when all the operations are done.

async.eachLimit(urls, function(url, done) {
    request(url, function(err, res, body) {
        // do something
        done();
    });
}, 5, function(err) {
    // do something
    console.log('all done!');
})

Upvotes: 1

josh3736
josh3736

Reputation: 144852

A good way to handle a large number of jobs in a conrolled manner is the async queue.

I also recommend you look at request for making HTTP requests and cheerio for dealing with the HTML you get.

Putting these together, you get something like:

var q = async.queue(function (task, done) {
    request(task.url, function(err, res, body) {
        if (err) return done(err);
        if (res.statusCode != 200) return done(res.statusCode);

        var $ = cheerio.load(body);
        // ...
        done();
    });
}, 5);

Then add all your URLs to the queue:

q.push({ url: 'https://www.example.com/some/url' });
// ...

Upvotes: 5

Related Questions