Serik Almatov
Serik Almatov

Reputation: 11

phantomjs returns empty content from a certain website

There is one state website gov.kz. I want to take information from there and paste it into my project so that all the information about the state service will be on one convenient place. Previously, the state site was on a different domain and everything worked well with the Simple HTML DOM library. Now they have changed the site and I can’t parse with Simple DOM, cURL, nor phantomjs. The problem is the parsing algorithm through phantomjs works on all other sites except this one. It feels like index.html is being parsed without a filled context (I don’t know, maybe site scripts do not run). I enclose the code below. P.S. I tried different variations of this code, it did not help.

get-website.php:

$response = [];
    exec ($_SERVER['DOCUMENT_ROOT'].'phantomjs --debug=no --ignore-ssl-errors=yes get-website.js 2>&1',$response);

    var_dump($response);

get-website.js:

var page = require('webpage').create();
page.settings.javascriptEnabled=true;
page.settings.userAgent='Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36';
page.settings.loadImages= true;
page.settings.cookiesEnabled=true;
page.viewportSize = {
  width: 1366,
  height: 768
};

page.open('https://gov.kz');

page.onLoadFinished =  function (status) {
    if (status !== 'success') {
        console.log('Unable to load the address!');
        phantom.exit();
    } else {
  var content = page.content;
  console.log('Content: ' + content);
      page.evaluate(function() {
    });

            page.render('image.png');

            phantom.exit();

    }
};

Upvotes: 1

Views: 200

Answers (0)

Related Questions