Reputation: 1685
Say someone else has a website generated by JavaScript, so I can't go look at the source and read what should be on the screen. How can I grab the text on the screen so I can feed it into another program? Also, how can I write a program that automatically clicks on radio buttons, links, etc. that satisfy certain criteria?
Upvotes: 1
Views: 169
Reputation: 1765
You can write a web scraping tool in Perl or Python. Or, you can use existing tools and frameworks to achieve that.
Check out Scrapy, an open-source tool written in Python.
Take a look at Selenium too.
Upvotes: 1
Reputation: 161773
If you need to handle content generated by script, then your first problem is to cause the script to execute. Further, the script will want to generate the content into a DOM. That means you need to have a DOM, and a script engine, and probably HTTP access to the Internet, and XML handling, etc.
If that sounds a lot like a web browser, then you're listening.
What you basically need is a web browser that you can control from a program. You'll need to be able to tell it to browse to a page, click buttons and links, etc., then you'll need to read back the resulting DOM.
Only then will you need to parse the page.
If you're in the Microsoft world, then you can use the WebBrowser control. There are several forms of this, and they all amount to the same thing: you can have Internet Explorer run inside of your program, and your program can control it.
I understand there are other browsers that can be controlled from a program, but since I don't know their details, I'll wait for someone else to tell us both.
Upvotes: 1
Reputation: 4067
To parse dynamic content you could see the javascript source and get that same content the same way the webpage is getting it. (ie. replicating ajax calls and such)
If you want to submit data (not actually click on the elements) as if it were clicked/edited/selected you could also send a request containing the same data that the server is expecting by using some HTTP library, like CURL. See an example here.
Upvotes: 1