Reputation: 901
I want to run some app on my vps ubuntu server for crawl testing purposes.
My app uses meteor-router
from 'atmosphere' with mrt
package manager.
On my local mac os x 10.8 with phantomjs, installed with brew
, everything goes fine. I get nice snapshot of my page by adding
http://sample.com/?_escaped_fragment_=
to url.
Lets try the same on my ubuntu vps server. 2 ways:
1) copy not bundled app to server and run it with mrt run
command: It works unstable. Sometimes it renders ok. But sometimes my dynamic content is blank. Like my db is empty.
2) copy not bundled app to server and mrt bundle fname.tgz
it, then unpack .tgz and run its main.js
with node. This way spiderable works absolutely wrong. i get blank instead of dynamic data every time i try.
My ubuntu machine has a lot less memory and processor resources than my local machine. That is why it takes more time to generate dynamic content, but phantom thinks that page is over and makes snapshot before meteor render.
Any suggestions?
Upvotes: 3
Views: 513
Reputation: 839
I believe that the proper way to do this is to pass a callback to page.open
, like so (see the docs):
page.open(url, function (status) {
...
};
Also, if you want to rely on a timeout for the snapshotting, I would decrease the timeout and wrap it in a cycle to both speed it up and make it more reliable:
page.open(url, function (status) {
if(status !== 'success') {
phantom.exit();
return;
}
function isReady() {
return page.evaluate(function () {
if('undefined' === typeof Meteor
|| 'undefined' === typeof(Meteor.status)
|| !Meteor.status().connected)
return false;
Meteor.flush();
return Meteor._LivedataConnection._allSubscriptionsReady();
}
}
function trySnapshot() {
if(!isReady()) {
setTimeout(trySnapshot, 100);
return;
}
console.log(page.content
.replace(/<script[^>]+>(.|\\n|\\r)*?<\\/script\\s*>/ig, '')
.replace('<meta name=\"fragment\" content=\"!\">', '')
);
phantom.exit();
}
trySnapshot();
};
I also think that my last snippet will frequently be executed without timeout at all, because page.open
callback is called at the proper time
Upvotes: 0
Reputation: 901
I think I solved this issue. It is really a problem in spiderable.js file. this module runs phantomjs in REPL state and gives him such code by stdin:
var url = '" + url + "';
var page = require('webpage').create();
page.open(url);
setInterval(function() {
var ready = page.evaluate(function () {
if (typeof Meteor !== 'undefined'
&& typeof(Meteor.status) !== 'undefined'
&& Meteor.status().connected) {
Meteor.flush();
return Meteor._LivedataConnection._allSubscriptionsReady();
}
return false;
});
if (ready) {
var out = page.content;
out = out.replace(/<script[^>]+>(.|\\n|\\r)*?<\\/script\\s*>/ig, '');
out = out.replace('<meta name=\"fragment\" content=\"!\">', '');
console.log(out);
phantom.exit();
}
}, 100);
The problem is when all Meteor conditions are passed, it thinks that page.content is 100% updated. But it is not. The solution i found and tested is to wrap if
block in setTimeout
(500ms worked for me just fine):
if (ready) {
setTimeout(function () {
var out = page.content;
out = out.replace(/<script[^>]+>(.|\\n|\\r)*?<\\/script\\s*>/ig, '');
out = out.replace('<meta name=\"fragment\" content=\"!\">', '');
console.log(out);
phantom.exit();
}, 500);
}
Upvotes: 2