How to extract all text under the same tag using XPath?

Question

I want to extract the text inside the span tag without any space and make it as a one single string at the end.

I want this result:

12345678910

This is my code below:

'AddressLocalityJap':"".join(response.xpath('normalize-space(//*[@id="anchor-rd-detail"]/section[1]/table/tbody/tr[4]/td/p[2]/span/span[2]//text()').extract())

stamaimer · Accepted Answer

You can get all spans by //span/span. And get text in each span use text_content(). And substitute all whitespace characters use regex.

import re
from lxml import html

tree = html.fromstring(html_source)

span = tree.xpath("//span/span", smart_strings=0)

text = ''.join([re.sub(r"\s+", '', item.text_content()) for item in span])

How to extract all text under the same tag using XPath?

Answers (2)

Pure XPath 1.0 solution

Related Questions