Reputation: 460
I'm trying to find the best way to target the PDF download link and download it to the correct directory on my computer. I'm trying to use CasperJS & XPath, as it seems like its the easiest way.
Currently what I have:
var x = require('casper').selectXPath;
var fs = require('fs');
casper.start('http://www.regulations.gov/#!documentDetail;D=APHIS-2012-0047-0291');
var classVal = x("//a[@class='gwt-Anchor']/@href");
casper.download(classVal, 'C:/users/bnickerson/desktop/script/result/p.pdf');
Whenever this runs, it downloads a file, but its an html file just named p.pdf. If I open it, I get this:
HTTP Status 404 - /%5Bobject%20Object%5D
type Status report
message /%5Bobject%20Object%5D
description The requested resource (/%5Bobject%20Object%5D) is not available.
JBoss Web/7.0.17.Final
The page that I'm trying to get this PDF download from: http://www.regulations.gov/#!documentDetail;D=APHIS-2012-0047-0291
Upvotes: 1
Views: 3569
Reputation: 61892
You should look closer what arguments download
accepts. Don't mix selectors and plain strings. classVal
is an XPath selector and not the textual content that is behind the selector. You can retrieve an element attribute using getElementAttribute
.
casper.then(function(){
var classVal = x("//a[@class='gwt-Anchor' and contains(@href,'contentType=pdf')]");
var url = casper.getElementAttribute(classVal, "href");
casper.download(url, 'C:/users/bnickerson/desktop/script/result/p.pdf');
});
Upvotes: 2