user9424364
user9424364

Reputation:

How to get text of selected elements in XPath?

I try to extract several forum posts by using the standard XPath method:

response.xpath('.//div[contains(@class, "Message userContent")]')

That one returns a complete list of comments as wished.

But once I include //text() or string(...) the length of the list jumps up to 100 or 150 items, which makes it impossible to grasp or to iterate over the list and join it with other data like author or the date...

normalize-space(...) only returns the first comment.

It has to do something with all the new lines and breaks in the html code but at this stage I have no idea how to handle these.

Would string-join(...[normalize-space()]) be an option here?

Upvotes: 1

Views: 4303

Answers (1)

kjhughes
kjhughes

Reputation: 111501

Realize what each XPath is selecting:

  1. .//div[contains(@class, "Message userContent")] selects div elements.
  2. .//div[contains(@class, "Message userContent")]//text() selects all text node descendants of those div elements.
  3. normalize-space(.//div[contains(@class, "Message userContent")]) in XPath 1.0 takes the space-normalized string value of the first such div element.
  4. normalize-space(.//div[contains(@class, "Message userContent")]) in XPath 2.0 is a runtime error when normalize-space() is passed a sequence.

If you want to get the string values of each such div:

  • XPath 1.0: Iterate over the selected div elements in the hosting language and separately take the string value.
  • XPath 2.0: Append /string() to the XPath.

Upvotes: 3

Related Questions