Reputation: 11285
I am trying to use PyQt to load the html of a web page which can then be manipulated and fed back to the page for web scraping. I am basically trying to log into a page with Javascript on it, search for documents to download (by selecting a check box next to the correct one's names), and then clicking a download button which pops out another page.
Does anyone know the functions I would use? Is there a way to discuss this without going into Classes? (My understanding of Classes is not as good as it could be, I am trying to learn, I'm still something of a beginner).
Sorry if I didn't explain this well. I'm trying to use either PyQt or PySide to do this.
Upvotes: 3
Views: 1373
Reputation: 29452
I use pyqt/pyside to load a page, wait for the JavaScript to execute, then parse the resulting HTML for the content of interest.
Here is an example script:
http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/
Upvotes: 2
Reputation: 2021
I think you are confused about where things happen, so it is not clear to me what it is you are attempting to do, but lets make a guess.
I think you want to automate the use of a web site, where you have to call up a selection page, tick a box, click a button and handle the resulting download.
If you only want to do it a few times, for testing the site, then check out watir and Selenium.
If you really wish to code it up in Python, then you will have to understand the page sent with the check box well enough that you can find and extract the form, create a POST from the fields in that form, and send the POST to get your download. If the page contains javascript this might add/remove/inhibit you from creating a valid post.
Then you will have to catch and save the resulting download.
And you will have a panic change to your code, every time the site changes its html pages.
I don't envy you that job one bit.
Upvotes: 0