Reputation: 61
i'm having the following issue. I need to web-scraping the following web page form.
This web page is for vehicle technical review, you can try using the following car license CDSR70
As i have mentioned, i'm using node.js and my package.json file is the following:
{
"name": "test",
"dependencies": {
"express": "^3.4.8",
"express.io": "^1.1.13",
"swig": "^1.3.2",
"connect-redis": "^1.4.7",
"request": "^2.34.0",
"cheerio": "^0.13.1",
"urllib": "^0.5.8"
}
}
Also i'm using Firebug to understand which parameters are being sent to the database, but apparently, this form is sent using AJAX, so using Firebug hasn't been much helpfull.
This is my code that i'm trying to execute.
var urllib = require('urllib');
var cheerio = require('cheerio');
urllib.request('http://www.prt.cl/Paginas/RevisionTecnica.aspx', {
method: 'POST',
data: {ppu: 'CDSR70'}
}, function(err, data, res) {
if(!err && res.statusCode == 200){
var $ = cheerio.load(data);
$('#resultPanel').each(function() {
console.log($(this).text().trim());
});
}
else
//TODO
throw err;
});
This is the html that contains the table results
<div id="resultPanel" style="display: block;">
What i'm trying to scrap, is the entire table-result, that means Vehicule information (Información del vehículo) and every garage that X vehicule has been examinated. ( Información de Revisión Técnica). The main problem is that i'm getting this text only.
Pinche para ver información de Revisión Técnica
Pinche para ver información de Planta de Revisión Técnica
Mapa de Ubicación de PRT
As you can see, the trim()
function is not working either.
Any help and suggestion are welcome. Thanks
EDIT: If i change the POST
method for GET
method, i get the same result.
Upvotes: 0
Views: 2415
Reputation: 61
Finally. I got an answer from a facegroup. Actually the URL was wrong. So the url actually should be:
urllib.request('http://www.prt.cl/infovehiculomttwsNew.asmx/infoVehiculoMTT', {
method: 'POST',
data: {ppu: 'CDSR70'}
}, function(err, data, res) {
if(!err && res.statusCode == 200){
var $ = cheerio.load(data);
$('*').each(function() {
console.log($(this).text());
});
}
else
//TODO
throw err;
});
Upvotes: 0
Reputation: 3313
Have a look at PhantomJS (http://phantomjs.org/) and CasperJS (http://casperjs.org/). Both are build with nodejs and can execute javascript - you should be able to scrape that site with them.
Upvotes: 1