jbhelfrich
jbhelfrich

Reputation: 410

Node: close http.get response even though server sends keep-alive

So we're about to move a large amount of content from one site to another site, and all the content will have different paths. The webserver will be using 301 redirects to make sure that people with existing bookmarks get to the new resource. I've been asked to write a script that will test that all the redirects have been set up properly.

The expected redirects will be in a text file with this format:

/path/to/resource/1 http://www.newsite.com/new/path/to/resource1    
/path/to/resource/2 http://www.newsite.com/new/path/to/resource2

This will be a very large file, so I wrote a node script that uses line-reader to pull each line out of the file, and pass it to a function that does the actual check.

It works fine for files up to five lines in length. If the file has more than 5 entries, it still loops through the whole file and it calls the check function each time (I've used console.log to confirm this) but only the first five return--the code below lists out "Calling check301 for..." for each line in the file, but only the first five hit the "Getting..." log statement. I've tried increasing timeouts. I check for errors on the http get call. I added code trying to catch any unhandled exceptions. Nada.

What am I missing?

EDIT: So apparently what I'm missing is that http defaults to five sockets available at a time (http://nodejs.org/api/http.html#http_agent_maxsockets) AND my server is sending keep-alives. Is there a way to force the connection to ignore the keep-alive header, or to destroy the connection once I'm done processing the response?

/* Check a provided list of URL pairs for redirection.
 * redirects.txt should have one line per redirect, with the url to
 * be requested and the URL to be redirected to seperated by a space.
 */
var urlBase = "http://www.example.com",
    testPair = [],
    http = require('http'),
    lineReader = require('line-reader');

function check301(source, destination){
  console.log('Calling check301 for ' + source);
  var target = urlBase + source;
  http.get(target, function(response){
    console.log('Getting ' + source);
    if (response.statusCode != 301 ||
        response.headers.location != destination){
      console.log(source + ' does not redirect to ' + destination);
    }
  }).on('error', function(e){
    console.log(e.message);
  });
}

//Throttled version.  No more than 5 reqs a second to keep the server happy.
lineReader.open('redirects.txt', function(reader){
  var interval = setInterval(function(){
    if(reader.hasNextLine()){
      reader.nextLine(function(line){
        testPair = line.split(' ');
        check301(testPair[0], testPair[1]);
      });
    } else {
      clearInterval(interval);
      console.log('Done');
    }
  }, 200);
});

Upvotes: 1

Views: 175

Answers (1)

zamnuts
zamnuts

Reputation: 9582

Set the agent property to false to force Connection: close (I recommend this only for your specific case, but not as a default go-to option): http://nodejs.org/api/http.html#http_http_request_options_callback

IIRC, not using Node.js HTTP's underlying default Agent will also mitigate the pooling "problem" you are observing.

Bonus info: Simply throttling the number of requests to 5/sec as you have done via an interval is not good enough. You need to wait for your http.get calls to callback before you begin the next one. In the case where it takes longer than 1 second to capture the response and close the connection, your request rate will exceed 5 per second. I recommend something similar to async's parallel limit control flow: https://github.com/caolan/async#parallellimittasks-limit-callback

Upvotes: 1

Related Questions