Shaun314
Shaun314

Reputation: 3461

Python 3.X Extract Source Code ONLY when page is done loading

I submit a query on a web page. The query takes several seconds before it is done. Only when it is done does it display an HTML table that I would like to get the information from. Let's say this query takes a maximum of 4 seconds to load. While I would prefer to get the data as soon as it is loaded, it would be acceptable to wait 4 seconds then get the data from the table.

The issue I have is when I make my urlread request, the page hasn't finished loading yet. I tried loading the page, then issuing a sleep command, then loading it again, but that does not work either.

My code is

import urllib.request
import time

uf = urllib.request.urlopen(urlname)
time.sleep(3)
uf.decode('UTF-8')
text = uf.read()
print (text) 

The webpage I am looking at is http://bookscouter.com/prices.php?isbn=9781111835811 (feel free to ignore the interesting textbook haha)

And I am using Python 3.X on a Raspberry Pi

Upvotes: 1

Views: 507

Answers (1)

kindall
kindall

Reputation: 184250

The prices you want are not in the page you're retrieving, so no amount of waiting will make them appear. Instead, the prices are retrieved by a JavaScript in that page after it has loaded. The urllib module is not a browser, so it won't run that script for you. You'll want to figure out what the URL is for the AJAX request (a quick look at the source code gives a pretty big hint) and retrieve that instead. It's probably going to be in JSON format so you can just use Python's json module to parse it.

Upvotes: 4

Related Questions