Reputation: 27
This is the code I have at the moment:
>>>p = []
>>>r = root.findall('.//*div/[@class="countdown closed"]/')
>>>r
'<abbr data-utime="1383624000" class="timestamp"/>'
>>>for i in r:
s = i.attrib
p.append(s['data-utime'])
>>>p
['1383624000']
s yields:
{'class': 'timestamp', 'data-utime': '1383624000'}
I think the code above is verbose(creating a list, using a for loop for only 1 string).
I know lxml is capable of achieving this more succinctly however I am unable to achieve this, I appreciate any assistance.
Upvotes: 0
Views: 249
Reputation: 1124548
If you are expecting to find just one element, use .find()
, not .findall()
:
r = root.find('.//*div/[@class="countdown closed"]/')
if r is not None:
p.append(r['data-utime'])
element.find()
returns None
if no match is found, or the element. If you are certain that the element is always present, you can omit the if r is not None
test.
Because you are using lxml, you can use the element.xpath()
method to use a more powerful XPath expression that what mere ElementTree methods can support. You can add a /@attribute-name
attribute selection element to the path to select the attribute value directly:
attr = root.xpath('.//*div[@class="countdown closed"]/@data-utime')
p.extend(attr)
.xpath()
returns a list as well, but you can just use p.extend
to add all contained values to p
in one step.
Upvotes: 1
Reputation: 295766
Use XPath, not the ElementTree findall()
(which is a more limited and restricted language present for compatibility with the ElementTree library lxml extends), and address your path all the way down to the attribute:
root.xpath('//html:div[@class="countdown closed"]/@data-utime',
namespaces={'html': 'http://www.w3.org/1999/xhtml'})
(It is possible to use namespace wildcards in XPath, but not great practice -- not only does it leave one open to namespace collisions, but can also be a performance impediment if your engine indexes against fully-qualified attribute names).
Upvotes: 3