Reputation: 33
I am trying to collect information from a webpage and cannot get the correct XPath to find it. Here is a piece from a website:
<div class="posted">
<div>
June 20, 2018
</div>
</div>
I want to search each page for this divide class that says "posted", then return everything under it as a string. (A messy string is ok; I will just use "if "2018" in "possibleDate"" to search for the year) Here is what I am trying:
possibleDate = str(tree.xpath("//div[contains(@class, ’posted’)]//@text"))
It says that it is an invalid expression.
What am I doing wrong?
Upvotes: 3
Views: 2637
Reputation: 111491
First, replace the ’
characters with '
characters surrounding posted
.
Next, replace @text
with text()
to eliminate your XPath syntax error.
Also, you might want to use the space normalized string value of the selected div
rather than selecting text nodes:
possibleDate = str(tree.xpath("normalize-space(//div[@class='posted'])")
This will abstract across mark-up variations nested within the targeted div
.
See also: xpath: find a node whose class attribute matches a value and whose text contains a certain string
Upvotes: 1