Hmd88
Hmd88

Reputation: 13

Duplicate results in Xpath and not CSS selectors in scrapy

So I am playing around with scrapy through the tutorial. I am trying to scrape the text, author and tags of each quote in the companion website when using CSS selectors like mentioned there:

for quote in response.css('div.quote'):
    print quote.css('span.text::text').extract()
    print quote.css('span small::text').extract()
    print quote.css('div.tags a.tag::text').extract()

I get the desired result (i.e: print of each text, author and quotes once). But once using Xpath selectors like this:

for quote in response.xpath("//*[@class='quote']"):
    print quote.xpath("//*[@class='text']/text()").extract()
    print quote.xpath("//*[@class='author']/text()").extract()
    print quote.xpath("//*[@class='tag']/text()").extract()

I get duplicates results!

I still can't find why there is such a difference between the 2.

Upvotes: 1

Views: 1049

Answers (2)

yasirnazir
yasirnazir

Reputation: 1155

When you use // it will get all results from response. If you use .// then it scope will be limited to that selector. Try .// instead of //. It will solve your problem :-)

Upvotes: 1

Neil B.
Neil B.

Reputation: 86

Try .// instead of // for your relative searches e.g.

print quote.xpath(".//*[@class='text']/text()").extract()

When you use //, although you're searching from quote, it takes this to mean an absolute search so its context is still the root of the document. .// however, means to search from . - the current node - and the context of this search will be limited to the elements nested under quote.

As a side note, if you're looking to get the exact same results, you should consider changing * to the tags you used in the CSS search - span or div. In this case it doesn't make any difference but just a head's up for future reference.

Upvotes: 4

Related Questions