Reputation: 22440
I have written a script in python using css selector to parse some names and phone numbers from a webpage. The script I have created is not giving me the results I expect ; rather, some information that i don't want are also coming along. How to rectify my selectors so that it will uniquely parse only the name and the phone number and nothing else. For your consideration I've pasted a link containing html elements at the bottom. Thanks in advance.
Here is what I've written:
from lxml.html import fromstring
root = fromstring(html)
for tree in root.cssselect(".cbFormTableEvenRow"):
try:
name = tree.cssselect(".cbFormDataCell span.cbFormData")[0].text
except:
name = ""
try:
phone = tree.cssselect(".cbFormLabel:contains('Phone Number')+td.cbFormDataCell .cbFormData")[0].text
except:
phone = ""
print(name,phone)
Results I expect:
JAYMES CARTER (402)499-8846
Results I'm getting:
1840390831
RESIDENTIAL
JAYMES CARTER (402)499-8846
None
My valuation jumped by almost $60,000 in one year. There are multiple comparable properties nearby that are much lower than my $194,300 evaluation, and a lot closer to my 2016 year evaluation of $134,400.
Link to the html file:
https://www.dropbox.com/s/64apg5cjpssd3hb/html_table.html?dl=0
Upvotes: 0
Views: 51
Reputation: 21663
Find the tr
element that is the grandparent of the span
whose text is 'Phone Number'. From there, get the td
elements of the desired items and follow the hierarchy down from these to their texts.
>>> from lxml.html import fromstring
>>> root = fromstring(open('html_table.html').read())
>>> grand_parent = root.xpath('.//td[contains(text(),"Phone Number")]/..')[0]
>>> grand_parent.xpath('td[1]/span/text()')[0]
'JAYMES CARTER'
>>> grand_parent.xpath('td[5]/span/text()')[0]
'(402)499-8846'
Addendum in response to comment:
>>> items = grand_parent.xpath('.//span[@class="cbFormData"]/text()')
['JAYMES CARTER', '\xa0', '(402)499-8846']
>>> items = grand_parent.xpath('.//span[@class="cbFormData"]/text()')
>>> [_.replace('\xa0', '').strip() for _ in items]
['JAYMES CARTER', '', '(402)499-8846']
Upvotes: 1