Reputation: 310
The question may sound easy, but I am facing difficulty in solving it. I have a table like following:
<table><tbody>
<tr>
<td>2003</td>
<td><span class="positive">1.19</span> </td>
<td><span class="negative">-0.48</span> </td>
</tr>
My code is following:
from lxml import etree
for elem in tree.xpath('//*[@id="printcontent"]/div[8]/div/table/tbody/tr'):
for c in elem.xpath("//td"):
if(c.getchildren()): # for the <span> thing
text = c.xpath("//span/text()")
else:
text = c.text
But I am unable to iterate over the "td" elements. I have been trying this whole day but of no avail!! I want to get 2003. 1.19, and -0.48.
Kindly help!
Upvotes: 3
Views: 8532
Reputation: 879471
It looks like you have HTML, not XML. Therefore, use lxml.html, not lxml.etree
to parse the data. If data.html
looks like this:
<table><tbody>
<tr>
<td>2003</td>
<td><span class="positive">1.19</span> </td>
<td><span class="negative">-0.48</span> </td>
</tr>
then
import lxml.html as LH
tree = LH.parse('data.html')
print([td.text_content() for td in tree.xpath('//td')])
yields
['2003', '1.19 ', '-0.48 ']
If
for elem in tree.xpath('//*[@id="printcontent"]/div[8]/div/table/tbody/tr'):
is not returning any elem
s, then you need to show us enough HTML to help us debug why this XPath is not working.
Upvotes: 6