QM.py
QM.py

Reputation: 652

Why the program returns an empty list when I use lxml to get information in a website

I want to get the column "Name of Menber" in website http://164.100.47.132/LssNew/Members/Alphabaticallist.aspx , so what I want the program returns is a list of "Adhalrao Patil,Shri Shivaji..", but I get an empty list. Xpath is verified in Firepath, so I just don't know what's wrong. Here is my code:

import urllib
from lxml import etree

result =   urllib.urlopen("http://164.100.47.132/LssNew/Members/Alphabaticallist.aspx")
html = result.read()

parser = etree.HTMLParser()
tree   = etree.parse(StringIO.StringIO(html), parser)
print type(tree)
xpath = ".//* [@id='ctl00_ContPlaceHolderMain_Alphabaticallist1_dg1']/tbody/tr[position()>1]/td[position()=3]/a/text()"
filtered_html = tree.xpath(xpath)

print filtered_html

and it returns:

[]

However, when I use another xpath:

.//*[@id='ctl00_ContPlaceHolderMain_Alphabaticallist1_dg1_ctl02_Hyperlink2']

I can get the value of the first column:

[Adhalrao Patil,Shri Shivaji]        

The two xpath are both verified in firepath, Why the former cannot work?

Upvotes: 1

Views: 339

Answers (1)

Birei
Birei

Reputation: 36262

I guess that some tags, like <tbody> are filtered out from the html code read by lxml, so try without it, like:

xpath = ".//* [@id='ctl00_ContPlaceHolderMain_Alphabaticallist1_dg1']/tr[position()>1]/td[position()=3]/a/text()"

Upvotes: 2

Related Questions