Reputation: 6034
As far as I can tell, this is the case for LyricWikia. The lyrics (example) can be accessed from the browser, but can't be found in the source code (can be opened with CTRL + U in most browsers) or reading the contents of the site with Python:
from urllib.request import urlopen
URL = 'http://lyrics.wikia.com/Billy_Joel:Piano_Man'
r = urlopen(URL).read().decode('utf-8')
And the test:
>>> 'Now John at the bar is a friend of mine' in r
False
>>> 'John' in r
False
But when you select and look at the source code of the box in which the lyrics are displayed, you can see that there is: <div class="lyricbox">[...]</div>
Is there a way to get the contents of that div
-element with Python?
Upvotes: 1
Views: 88
Reputation: 46677
You can try Ghost.py, which is essentially Phantom.js for Python. It embeds WebKit and is thus able to execute the JavaScript on the page as if you had navigated to the page manually. It then gives you access to the DOM structure.
Upvotes: 2