Reputation: 63
Here is some sample HTML
<div class="something">
<p> This is a <b> Paragraph </b> with <a href="/something"> mixed </a> elements
<p> Next paragraph....
</div>
what I tried was
//div[contains('@class','something')/text()
and
//div[contains('@class','something')/*/text()
and
//div[contains('@class','something')/p/text()
all of these seem to skip the 'b' tags and the 'a' tags.
Upvotes: 1
Views: 746
Reputation: 2975
Try " ".join(sel.xpath("//div[contains(@class,'something')]//text()").extract())
where sel
is selector in your case may be response
.
Upvotes: 3
Reputation: 10220
It depends on what and how you want to obtain. Anyway, there are couple of problems with what you tried:
]
) after contains
in the XPath expression.@class
should not be enclosed in (single) quotes when used inside contains
.If you want to get all the text of div
element as one string, you might use
normalize-space(//div[contains(@class,'something')])
Upvotes: 1
Reputation: 29052
Use the XPath expression
//div[contains(@class,'something')]//text()
to get a concatenation of the text of all the text()
nodes in the chosen div
element.
Output:
This is a Paragraph with mixed elements
Next paragraph....
Upvotes: 2