George Sonancia
George Sonancia

Reputation: 33

How to use contains() in XPath?

I am trying to collect information from a webpage and cannot get the correct XPath to find it. Here is a piece from a website:

<div class="posted">
  <div>
    June 20, 2018
  </div>
</div>

I want to search each page for this divide class that says "posted", then return everything under it as a string. (A messy string is ok; I will just use "if "2018" in "possibleDate"" to search for the year) Here is what I am trying:

possibleDate = str(tree.xpath("//div[contains(@class, ’posted’)]//@text"))

It says that it is an invalid expression.
What am I doing wrong?

Upvotes: 3

Views: 2637

Answers (1)

kjhughes
kjhughes

Reputation: 111491

First, replace the characters with ' characters surrounding posted.

Next, replace @text with text() to eliminate your XPath syntax error.

Also, you might want to use the space normalized string value of the selected div rather than selecting text nodes:

possibleDate = str(tree.xpath("normalize-space(//div[@class='posted'])")

This will abstract across mark-up variations nested within the targeted div.

See also: xpath: find a node whose class attribute matches a value and whose text contains a certain string

Upvotes: 1

Related Questions