Reputation: 379
I have a HTML:
<div class="abc">
<div class="xyz">
<div class="needremove"></div>
<p>text</p>
<p>text</p>
<p>text</p>
<p>text</p>
</div>
</div>
I used: response.xpath('//div[contains(@class,"abc")]/div[contains(@class,"xyz")]').extract()
Result:
u'['<div class="xyz">
<div class="needremove"></div>
<p>text</p>
<p>text</p>
<p>text</p>
<p>text</p>
</div>']
I want remove <div class="needremove"></div>
. May you help me?
Upvotes: 1
Views: 1408
Reputation: 473763
You can get all the child tags except the div
with class="needremove"
:
response.xpath('//div[contains(@class, "abc")]/div[contains(@class, "xyz")]/*[local-name() != "div" and not(contains(@class, "needremove"))]').extract()
Demo from the shell:
$ scrapy shell index.html
In [1]: response.xpath('//div[contains(@class, "abc")]/div[contains(@class, "xyz")]/*[local-name() != "div" and not(contains(@class, "needremove"))]').extract()
Out[1]: [u'<p>text</p>', u'<p>text</p>', u'<p>text</p>', u'<p>text</p>']
Upvotes: 1