Joschua
Joschua

Reputation: 6034

Python: Is there a way to get HTML that was dynamically created by Javascript?

As far as I can tell, this is the case for LyricWikia. The lyrics (example) can be accessed from the browser, but can't be found in the source code (can be opened with CTRL + U in most browsers) or reading the contents of the site with Python:

from urllib.request import urlopen

URL = 'http://lyrics.wikia.com/Billy_Joel:Piano_Man'

r = urlopen(URL).read().decode('utf-8')

And the test:

>>> 'Now John at the bar is a friend of mine' in r
False
>>> 'John' in r
False

But when you select and look at the source code of the box in which the lyrics are displayed, you can see that there is: <div class="lyricbox">[...]</div>

Is there a way to get the contents of that div-element with Python?

Upvotes: 1

Views: 88

Answers (1)

Alexander Gessler
Alexander Gessler

Reputation: 46677

You can try Ghost.py, which is essentially Phantom.js for Python. It embeds WebKit and is thus able to execute the JavaScript on the page as if you had navigated to the page manually. It then gives you access to the DOM structure.

Upvotes: 2

Related Questions