James
James

Reputation: 3815

Scrapy: Using CSS Selectors to exclude a node/tag

In the documentation and SO articles, there are only references on how to exclude CSS classes using this nomenclature:

response.css("div[id='content']:not([class*='infobox'])")

What I want to achieve however is to exclude a node, or even, multiple nodes, such as <span> and <div> elements which are inside an <li> element.

Let me give you an example. Let's say I am scraping this HTML:

<li class="classA">
  <div class="classB">
    ..
  </div>

  <span class="classC">Whatever</span>

  This is the string I want to scrape
</li>

,and I am only interested in scraping the text "This is the string I want to scrape", thus I want to skip both <div> and <span> nodes. I tried to use the following, inside the scrapy shell, to no avail:

response.css(".classA:not(span|div)::text").extract()

,but I am still getting the excluded nodes.

Upvotes: 0

Views: 2148

Answers (2)

Vipool
Vipool

Reputation: 108

It's very easy:

1. Using css selector

response.css('li.classA::text').extract_first()

2. Using xpath selector

response.xpath('//li[@class = "classA"]/text()').extract_first()

Upvotes: 2

Easy:

response.css('li::text').extract_first()

Upvotes: 1

Related Questions