Michael
Michael

Reputation: 117

How to extract all text under the same tag using XPath?

<span rel="v:addr">
<span property="v:region">
  <a href="https://tabelog.com/en/tokyo/">
    123
  </a>
</span>
<span property="v:locality">
  <a href="https://tabelog.com/en/tokyo/A1317/A131710/rstLst/">
    456
  </a>
    <a href="https://tabelog.com/en/rstLst/">
      789
    </a>
  10
</span>
<span property="v:street-address">

</span>
</span>

I want to extract the text inside the span tag without any space and make it as a one single string at the end.

I want this result:

12345678910

This is my code below:

'AddressLocalityJap':"".join(response.xpath('normalize-space(//*[@id="anchor-rd-detail"]/section[1]/table/tbody/tr[4]/td/p[2]/span/span[2]//text()').extract())

Upvotes: 2

Views: 589

Answers (2)

kjhughes
kjhughes

Reputation: 111726

Pure XPath 1.0 solution

This XPath,

translate(string(normalize-space()), ' ', '')

will return

12345678910

for your HTML, as requested.

Upvotes: 1

stamaimer
stamaimer

Reputation: 6485

You can get all spans by //span/span. And get text in each span use text_content(). And substitute all whitespace characters use regex.

import re
from lxml import html

tree = html.fromstring(html_source)

span = tree.xpath("//span/span", smart_strings=0)

text = ''.join([re.sub(r"\s+", '', item.text_content()) for item in span])

Upvotes: 1

Related Questions