Reputation: 8086
I need to scrape a website form (on-the-fly) which has AJAX and SESSIONS. I did a lot of research and I came across several possible solutions one being Python::Mechanize. I don't know python and cURL alone for PHP
(from my understanding) cannot handle AJAX or submit forms.
I found what i believe is the possible stack which can lead me to grace :). Problem is that I do not know how to use these packages at all.
I downloaded and installed NODEjs and I can call it from cmd. (great)
I downloaded and installed PhantomJS (Not sure how to setup the PATH
so that it is dynamic so I have to manually cd
in CMD to the DIR
to get it to load) How can I set this up in Windows 7? Not sure where to point the path.
Downloaded CasperJS - put in the DIR
So on phantomjs I was able to run a test file which echos 'hello world' in the CMD prompt. And now I here no clue how to proceed. -Ultimatly i need this to run (on-the-fly) from my webserver - so it needs to be implemented into my webpage. As of now I would like to just run it from CMD and get it to go to a page, submit a form, scrape the results, and write it to a file.
Can someone please explain like a workflow of how I can accomplish this?
CasperJS -> shows this form example. and I would like to implement with my variables, run the script and save the result.
casper.start('http://some.tld/contact.form', function() {
this.fill('form#contact-form', {
'subject': 'I am watching you',
'content': 'So be careful.',
'civility': 'Mr',
'name': 'Chuck Norris',
'email': '[email protected]',
'cc': true,
'attachment': '/Users/chuck/roundhousekick.doc'
}, true);
});
casper.then(function() {
this.evaluateOrDie(function() {
return /message sent/.test(document.body.innerText);
}, 'sending message failed');
});
casper.run(function() {
this.echo('message sent').exit();
});
Upvotes: 1
Views: 2444
Reputation: 350
After you install PhantomJS do next:
For now you can use phantomjs from your CMD. Ex.: phantomjs c:\mywebsite\with\ajax\dopescript.js
After these steps download CasperJS and put it in PhantomJS folder
Ex.: c:\phantomjs\casperjs
Do previous steps for PATH variable for CasperJS (plus \bin at the end)
Ex.: c:\phantomjs\casperjs\bin
Try casperjs from CMD.
If it's not working go to batchbin directory in casperjs folder and lunch casperjs.bat
Now try to call CasperJs from this folder. (Works for me)
So for now you should have PhantomJS + CasperJS.
About saving results:
Put this var fs = require('fs');
at the beginning of your script and call
fs.write('result.html', myData);
where myData
is data that you need to save.
Here is more information about FS: PhantomJS File System
Upvotes: 2