Node.js request not returning HTML on specific websites

Question

I am trying to write a web scraper for a NYC database of building and I am trying to get the html of the actual website. For whatever reason, when I put the url of the website I am trying to scrape, my program does nothing. Whenever I put the url of almost any other website, I actually get the html i requested. Is this because I am trying to scrape a government site?

var request = require("request");

request(
    { uri: "http://a810-bisweb.nyc.gov/bisweb/JobsQueryByNumberServlet?requestid=3&passjobnumber=123768556&passdocnumber=01" },
    function(error, response, body) {
        console.log(body);
        console.log("hello")
    }
);

I expected to recieve the html as a string printed in my console, instead, I get nothing. The "hello" is not even printed. However, when I try any other site, I get the actual html string.

user254694 · Accepted Answer

The url you are trying to get is giving an access denied.

I prefer the promise based api for request so the following code

var request = require("request");
request
  .get("http://a810-bisweb.nyc.gov/bisweb/JobsQueryByNumberServlet?requestid=3&passjobnumber=123768556&passdocnumber=01")
  .on('response', function(response) {
    console.log('Hello');
    console.log(response.statusCode);
    console.log(response.headers['content-type']);
  })
  .on('error', function(error){
    console.log(error);
  })

will print out

Hello
403
text/html

I am supposing the reason why you are getting that 403 is the site probably sets cookies or has some session state and you are going directly to the url you want instead of hitting the front page first. I get the 403 as well in the browser if I go directly to the url, but if I go to the front page first and then to the url I get the page.

Node.js request not returning HTML on specific websites

Answers (2)

Related Questions