Reputation: 1849
I am trying to extract some text and links from instapaper.com
. So I am using the following code to get the job done:
>>> import lxml.html as lh
>>> doc = lh.parse("http://www.instapaper.com/u/folder/1227370/programming")
>>> text = doc.xpath(".//*[@id='bookmark_list']/*/div[3]/a/text()")
>>> len(text)
0
>>> text
[]
As you can see it returns an empty list which means that it is not able to find any text matching the above xpath .
Now when I use the above xpath expr
in firebug/firepath it works fine.
You can see in the above image it shows 40 matching nodes
.
So, my question is why the above xpath expression is not working with python/lxml.
As requested Instapaper page source
Upvotes: 0
Views: 3974
Reputation:
There is no element with the ID bookmark_list
. Maybe you must be logged in.
Edit
Parsing the real HTML it works:
doc = lh.parse("http://pastebin.com/raw.php?i=1WpFAfCt")
text = doc.xpath("//*[@id='bookmark_list']/*/div[3]/a/text()")
len(text) # => 40
Upvotes: 5