Xpath normalize-space

Question

I'm feeling dumb. Python & xpath newbie here. I'm trying to extract the complete text 'Open Box Price: $1079.99' using xpath from



     Open Box Price:
$1079.99
    

    Regular Price: $1499.98

But I can't. text stops at . Here's my code

doc = lxml.html.fromstring(r.content)
elements = doc.xpath(item_xpath)
print elements[1].find('div[3]/p[1]/text()[normalize-space()]')

Jon Clements · Accepted Answer

A basis for the XPath you want is using descendant-or-self - tweak the result how you want:

>>> doc.xpath('//p[1]/descendant-or-self::text()')
['
    ', ' Open Box Price:', '$1079.99', '
    ']
>>> doc.xpath('//p[2]/descendant-or-self::text()')
['
    Regular Price: ', '$1499.98', '
    ']

Or as you're doing with lxml.html, you could use text_content()

paras = doc.xpath('//p'): # or findall etc...
for para in paras:
    print para.text_content()

Xpath normalize-space

Answers (2)

Related Questions