Samuel
Samuel

Reputation: 6490

PhantomJS cannot open local files of unknown extension

I am using phantomjs to get screenshot of local files. Now I was passing a perfectly valid html file:

<!DOCTYPE html><html><head><title>Title of the document</title></head><body>The file name dummy</body></html> 

with the file name dummy.hoo

PhantomJS seems to be unable to open this. Is this somewhere documented? Local files of the extension .html and .htm are fine, though.

Sample call (the path to the page is always converted to Uri scheme)

"Phantomjs.exe" --proxy-type=none --ssl-protocol=any --local-to-remote-url-access=true "Scripts\screenshot.js" "file:///D:/dummy.hoo" "base.png"

The js is simple:

var page = require('webpage').create();
var system = require('system');

if (system.args.length !== 3) {
    console.log('Usage: script.js <URL> <screenshot destination>');
    phantom.exit();
}

page.onResourceError = function(resourceError) {
    page.reason = resourceError.errorString;
    page.reason_url = resourceError.url;
};

page.open(system.args[1], function(status) {
    if (status !== 'success') {
        console.log('Failed to load address '+system.args[1]+' ' + page.reason_url               + ": " + page.reason);
        phantom.exit(-1);
    }
    page.render(system.args[2]);
    phantom.exit();
});

I can properly see the html contents of dummy.hoo when I copy the Uri and paste it to firefox etc. Only phantomjs seems to refuse to render this.

For dummy.hoo it goes always the error path saying failed to load address, the status is fail and no reason is given via the callback. (When I pass a non-existing url, I get a proper reason)

Failed to load address file:///D:/dummy.hoo undefined: undefined

I used the link to verbose error output from here: Debugging PhantomJS webpage.open failures

and this is the result:

= onNavigationRequested
  destination_url: file:///D:/dummy.hoo
  type (cause): Other
  will navigate: true
  from page's main frame: true
= onResourceRequested()
  request: {
    "headers": [
        {
            "name": "User-Agent",
            "value": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.9.0 Safari/534.34"
        },
        {
            "name": "Accept",
            "value": "*/*"
        }
    ],
    "id": 1,
    "method": "GET",
    "time": "2015-03-01T16:40:11.080Z",
    "url": "file:///D:/dummy.hoo"
}
= onLoadStarted()
  leaving url: about:blank
= onResourceReceived()
  id: 1, stage: "start", response: {"bodySize":110,"contentType":null,"headers":[{"name":"Last-Modified","value":"Sun, 01 Mar 2015 17:13:02 GMT"},{"name":"Content-Length","value":"110"}],"id":1,"redir
ectURL":null,"stage":"start","status":null,"statusText":null,"time":"2015-03-01T16:40:11.082Z","url":"file:///D:/dummy.hoo"}
= onResourceReceived()
  id: 1, stage: "end", response: {"contentType":null,"headers":[{"name":"Last-Modified","value":"Sun, 01 Mar 2015 17:13:02 GMT"},{"name":"Content-Length","value":"110"}],"id":1,"redirectURL":null,"sta
ge":"end","status":null,"statusText":null,"time":"2015-03-01T16:40:11.082Z","url":"file:///D:/dummy.hoo"}
= onLoadFinished()
  status: fail
Failed to load address file:///D:/dummy.hoo undefined: undefined

Upvotes: 2

Views: 1358

Answers (1)

Samuel
Samuel

Reputation: 6490

I was able to locate the code in phantomjs treating mime types (multiple locations for different drivers):

https://github.com/ariya/phantomjs/blob/48fabe06463460d2fb7026d6df9783216e26265c/src/qt/qtwebkit/Source/WebCore/platform/MIMETypeRegistry.cpp#L154

https://github.com/ariya/phantomjs/blob/48fabe06463460d2fb7026d6df9783216e26265c/src/qt/qtwebkit/Source/WebCore/platform/win/MIMETypeRegistryWin.cpp#L80 etc.

The gist (hehe) behind that is local files do not send header information containing the MIME type. Phantomjs does therefore not know which handler should be invoked to properly render the content. I basically could rename a .jpeg to .exe, as long as a web server would send the jpg mime type, it will be rendered correctly. This is common behaviour in the web, redirecting the url part based on whatever (regex, extension, etc)

Phantoms does not have some sort of inference that detects the real contents of a file (this is entirely plausible) therefore it must rely on the file extension and the mapping given.

So knowing that I have to accept that I can use html and htm extension to render html data, and nothing else.

Upvotes: 2

Related Questions