mito
mito

Reputation: 61

How to scrape a website using node.js with ASP and AJAX

i'm having the following issue. I need to web-scraping the following web page form.

This web page is for vehicle technical review, you can try using the following car license CDSR70

As i have mentioned, i'm using node.js and my package.json file is the following:

{
  "name": "test",
  "dependencies": {
  "express": "^3.4.8",
  "express.io": "^1.1.13",
  "swig": "^1.3.2",
  "connect-redis": "^1.4.7",
  "request": "^2.34.0",
  "cheerio": "^0.13.1",
  "urllib": "^0.5.8"
  }
}

Also i'm using Firebug to understand which parameters are being sent to the database, but apparently, this form is sent using AJAX, so using Firebug hasn't been much helpfull.

This is my code that i'm trying to execute.

var urllib = require('urllib');
var cheerio = require('cheerio');
urllib.request('http://www.prt.cl/Paginas/RevisionTecnica.aspx', {
method: 'POST',
data: {ppu: 'CDSR70'}
}, function(err, data, res) {
            if(!err && res.statusCode == 200){
                var $ = cheerio.load(data);
        $('#resultPanel').each(function() {
            console.log($(this).text().trim()); 
        });
    }
    else
        //TODO
        throw err;
    });

This is the html that contains the table results

<div id="resultPanel" style="display: block;">

What i'm trying to scrap, is the entire table-result, that means Vehicule information (Información del vehículo) and every garage that X vehicule has been examinated. ( Información de Revisión Técnica). The main problem is that i'm getting this text only.

Pinche para ver información de Revisión Técnica





                    Pinche para ver información de Planta de Revisión Técnica













                    Mapa de Ubicación de PRT

As you can see, the trim() function is not working either. Any help and suggestion are welcome. Thanks

EDIT: If i change the POST method for GET method, i get the same result.

Upvotes: 0

Views: 2415

Answers (2)

mito
mito

Reputation: 61

Finally. I got an answer from a facegroup. Actually the URL was wrong. So the url actually should be:

urllib.request('http://www.prt.cl/infovehiculomttwsNew.asmx/infoVehiculoMTT', {
    method: 'POST',
    data: {ppu: 'CDSR70'} 
}, function(err, data, res) {
    if(!err && res.statusCode == 200){
        var $ = cheerio.load(data);
        $('*').each(function() {
            console.log($(this).text());                
        });
    }
    else
        //TODO 
        throw err;
});

Upvotes: 0

prototype
prototype

Reputation: 3313

Have a look at PhantomJS (http://phantomjs.org/) and CasperJS (http://casperjs.org/). Both are build with nodejs and can execute javascript - you should be able to scrape that site with them.

Upvotes: 1

Related Questions