Sighonide
Sighonide

Reputation: 684

Trouble with scraping text from site using lxml / xpath()

quick one. I'm new to using lxml and have spent quite a while trying to scrape text data from a particular site. The element structure is as shown below:

http://tinypic.com/r/2iw7zaa/8

What i want to do is extract the 100,100 that is shown within the highlighted area. The statements i've tried include (I saved the source of the site into a text file to test, test.txt - tried also with html extension):

from lxml import html
tree = html.parse(test.txt)
#value = tree.xpath('//*[@id="content"]/table[4]/tbody/tr[1]/td[2]')
#value = tree.xpath('//*[@id="content"]/table[4]/tbody/tr[1]/td[2]/text()')

All i seem to get as a result is an empty list [] ,any help would be greatly appreciated.

ps i commented out the two value statements as I'm showing what i tried. I tried a bunch of other xpath statements similiar to the ones above but they were lost as the python shell crashed on me.

pps. apologies for the link to the pic - due to rep I can't post the pic directly.

Upvotes: 0

Views: 308

Answers (1)

chishaku
chishaku

Reputation: 4643

Try removing '/tbody' from the xpath.

The browser might be adding the `/tbody' tag whereas it might not appear in the raw HTML.

Read more here and here.

Upvotes: 1

Related Questions