ceth
ceth

Reputation: 45325

Get HTML page by URL

Here is my code:

 utilitesRouter.route('/url')
    .post(function(request, response) {
        console.log(request.body.uri);
        var urlOpts = { host: request.body.uri, path: '/', port: '80', method: 'GET' };
        var re = /(<\s*title[^>]*>(.+?)<\s*\/\s*title)>/gi;
        http.get(urlOpts, function (response) {
            response.on('data', function (chunk) {
                var str=chunk.toString();
                console.log(str);
                var match = re.exec(str);
                if (match && match[2]) {
                    console.log(match[2]);
                }
            });    
        });

        response.json({ url: request.body.uri });    
    });

If I use POST request with this JSON {"uri":"google.ru" } I get:

302 Moved
google.ru
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.ru/index.html">here</A>.
</BODY></HTML>

If I use POST requiet with JSON {"uri":"http://google.ru" } I get the error message:

events.js:85
      throw er; // Unhandled 'error' event
            ^
Error: getaddrinfo ENOTFOUND http://google.ru
    at errnoException (dns.js:44:10)
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:94:26)

I can open http://google.ru in my browser.

How can I get the HTML using node.js ?

Upvotes: 0

Views: 2717

Answers (2)

SaSchu
SaSchu

Reputation: 513

You get the error because in your urlOpts the attribute host has to be a domain name, like google.ru or www.google.ru. As you are putting a URL into it, it can't be resolved to an IP via DNS, that's why you get the error at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:94:26).

If you want to use http.get() like the way you do, you would always have to extract the domain part out of your passed uri, i.e. getting google.ru out of http://google.ru to use it as host.

Upvotes: 0

greenlikeorange
greenlikeorange

Reputation: 505

You may want to request to do that. It just pretty easy.

var request = require("request");

router.get('/proxy', function(req, res, next){
  request.get( req.body.uri, function(error, response, body){
    if( error )
      return next(error);

    res.send(body);
  });
});

request also support streaming and other cool features too.

Upvotes: 2

Related Questions