user1305760
user1305760

Reputation:

Extract all images from an external site [Node.js]

I'm working in a code that takes all the images from a website, then send that images as a String to the browser, but doesn't works!

I'm trying to use the http module to create a server, get the principal page of pinterest, match all the images tags, store each match in an Array and finally send it.

This is the code:

var http = require('http')
  , options = {
        host: 'www.pinterest.com'
      , port: 80
      , path: '/'
      , method: 'GET'
    }
  , images = [ ]
  ;


http.createServer( function ( request, response ) {

  http.request( options, function ( res ) {
    res.setEncoding( 'utf8' );
    res.on( 'data', function ( chunk ) {

      matches.push( chunk.match(/<img[^>]+src="([^">]+)/g) );

    });
  }).on('error', function(e) {
    console.log('problem with request: ' + e.message);
  });

  response.writeHead( 200, { 'Content-Type' : 'text/html' } );

  response.end( images.toString() );

}).listen(8888);

I don't have any error in the console, but after one minute, the console prints:

problem with request: socket hang up

Upvotes: 0

Views: 2396

Answers (2)

gustavohenke
gustavohenke

Reputation: 41440

Even if you have already resolved your problem, trying with the package cheerio is much easier. It's the best jQuery-like package for Node I've ever seen, it's very complete.

You would load the remote HTML and then filter the images, for example...

var imageUrl = $("img").attr("src");

Also, parsing the HTML in the data event may give you chunks of the tag, which is a problem.

Upvotes: 1

udidu
udidu

Reputation: 8588

I think that you have a problem with your regex. Anyway, this method will bring you the data:

var http = require('http')
  , options = {
    host: 'pinterest.com'
  , port: 80
  , path: '/'
  , method: 'GET'
}
  , images = [ ];

http.createServer( function ( request, response ) {


var req = http.get(options, function(res){
    res.setEncoding('utf8');
    res.on('data', function (chunk) {
        images.push( chunk.match(/<img[^>]+src="([^">]+)/g) );
    }).on('end', function(){
        response.writeHead( 200, { 'Content-Type' : 'text/javascript' } );
        response.end(images.toString());
    });
});

req.on('error', function(error){
    console.log('error: ' + error.message);
    response.writeHead( 200, { 'Content-Type' : 'text/html' } );
    response.end('error: ' + error.message);
});

}).listen(8888);

I used here the http.get method instead of the http.request

Upvotes: 0

Related Questions