Reputation: 59
I have seen many solutions on the internet, but non of them seem to work.
I have this code to retrieve information from a user in Imdb:
from lxml import html
import requests
page = requests.get('http://www.imdb.com/user/ur6447592/comments-expanded?start=0&order=alpha')
tree = html.fromstring(page.content)
result = tree.xpath('//*[@id="outerbody"]/tbody/tr/td/b[2]/text()')
print(result)
The output should be:
["Little flesh and all bones"]
Upvotes: 1
Views: 655
Reputation: 12610
Change xpath argument to:
'//*[@id="outerbody"]/tr/td/b[2]/text()'
Edit:
Thanks to the comments, I just realized why OP encountered the problem.
You can print page.content
to see the original html. (via @JacobIRR)
Or, in Firefox, Tools - Web Developer - Page Source.
In Google Chrome Developer Tools, as is quoted from @corn3lius:
If you use the network tab and look at the document returned it will give you the original state before whoever messes with the DOM.
Upvotes: 3