Reputation: 603
Hi I want to all the text from tag, but within that td tag there are multiple child tags like .
>>>import urllib2
>>>from lxml import etree
>>>import lxml
>>>site = "http://racing.racingnsw.com.au/InteractiveForm/HorseAllForm.aspx?HorseCode=ODA0ODQ0MTUy&src=horsesearch"
>>>req = urllib2.Request(site)
>>>page = urllib2.urlopen(req)
>>>content = page.read()
>>>root = etree.HTML(content)
>>>s = root.xpath('//*[@id="info-container"]/table[2]/tr[%s]/td[2]/text()'%'34')
>>>s
[' 1800m Good3 PETER YOUNG STK Group 2 $222,000 ($134,000) ', ' 59kg Barrier 5 Rtg 118 ', ' 2nd ', ' 59kg, 3rd ', ' 59kg 1:50.09 (600m 34.92), 0.1L, 7th@800m, 6th@400m, $2/$2.15/$2.15']
I want the text of the child tags as well as the td tags but my current lxml doesn't do this for me. Instead I want to see:
['RAND 31Jan14', ' 1300m Dead BT-4UEGOPN $000 ', 'Tommy Berry', ' 0kg Barrier 0 ', ' 1st ', 'Glencadam Gold (IRE)', ' 0kg, 3rd ', 'The Offer (IRE)', ' 0kg 1:20.90, 1L ', '\n']
or the string and join representation of that list which is more preferred:
'RAND 31Jan14 1300m Dead BT-4UEGOPN $000 Tommy Berry 0kg Barrier 0 1st Glencadam Gold (IRE) 0kg, 3rd The Offer (IRE) 0kg 1:20.90, 1L'
I've tried using etree.tostring(xpath,method="text") and look around the documentation but no luck
I would like to work exclusively in lxml so please don't use other libraries like Beautiful Soup. Cheers
Upvotes: 0
Views: 165
Reputation: 879729
The text
attribute only returns the text in that Element, but
the text_content
method returns all the text contained in an Element or its children:
import urllib2
import lxml.html as LH
site = "http://racing.racingnsw.com.au/InteractiveForm/HorseAllForm.aspx?HorseCode=ODA0ODQ0MTUy&src=horsesearch"
req = urllib2.Request(site)
page = urllib2.urlopen(req)
root = LH.parse(page)
for td in root.xpath('//*[@id="info-container"]/table[2]/tr[33]/td[2]'):
print(td.text_content())
yields
RAND 31Jan14 1300m Dead BT-4UEGOPN $000 Tommy Berry 0kg Barrier 0 1st Glencadam Gold (IRE) 0kg, 3rd The Offer (IRE) 0kg 1:20.90, 1L
Upvotes: 3