Reputation: 3288
With lxml.html, how do I access single elements without using a for loop?
This is the HTML:
<tr class="headlineRow">
<td>
<span class="headline">This is some awesome text</span>
</td>
</tr>
For example, this will fail with IndexError:
for row in doc.cssselect('tr.headlineRow'):
headline = row.cssselect('td span.headline')
print headline[0]
This will pass:
for row in doc.cssselect('tr.headlineRow'):
headline = row.cssselect('td span.headline')
for first_thing in headline:
print headline[0].text_content()
Upvotes: 0
Views: 1791
Reputation: 14284
I usually use the xpath method for things like this. It returns a list of matching elements.
>>> spans = doc.xpath('//tr[@class="headlineRow"]/td/span[@class="headline"]')
>>> spans[0].text
'This is some awesome text'
Upvotes: 1
Reputation: 18375
Elements are accessed the same way you access nested lists:
>>> doc[0][0]
<Element span at ...>
Or via CSS selectors:
doc.cssselect('td span.headline')[0]
Upvotes: 0
Reputation: 28696
Your "failing" example works perfectly for me? Either you made a mistake when trying it out, or you are using an older version of lxml that has a - now fixed - bug (I tried 2.2.6, and with 2.1.1 - the oldest I had around, and both worked)
Upvotes: 0
Reputation: 74795
I tried out your example using CSSSelector
and headline[0]
worked fine. See below:
>>> html ="""<tr class="headlineRow">
<td>
<span class="headline">This is some awesome text</span>
</td>
</tr>"""
>>> from lxml import etree
>>> from lxml.cssselect import CSSSelector
>>> doc = etree.fromstring(html)
>>> sel1 = CSSSelector('tr.headlineRow')
>>> sel2 = CSSSelector('td span.headline')
>>> for row in sel1(doc):
headline = sel2(row)
print headline[0]
<Element span at 8f31e3c>
Upvotes: 0