Reputation: 37
I want to convert a CSS selector to XPath in a Scrapy project.
I'm learning Scrapy from its website tutorial and I'm having trouble translating directly from CSS language to XPath.
The CSS selector used to parse http://quotes.toscrape.com/ is:
`>>> for quote in response.css("div.quote"):
... text = quote.css("span.text::text").extract_first()
... author = quote.css("small.author::text").extract_first()
... tags = quote.css("div.tags a.tag::text").extract()
... print(dict(text=text, author=author, tags=tags))`
I've tried writing using XPath as:
In [83]: for quote in response.xpath('//div[@class="quote"]'):
...: text = quote.xpath('//span[@class="text"]/text()').extract_first()
...: author = quote.xpath('//small[@class="author"]/text()').extract_first()
...: tags= quote.xpath('//div[@class="tags"]/a[@class="tag"]/text()').extract()
...: print(dict(text=text,author=author,tags=tags))`
In the CSS path I get info on different quotes, while on XPath I get the same quote multiple times in the list. What am I doing wrong?
Upvotes: 1
Views: 248
Reputation: 89285
"In the CSS path I get info on different quotes, while on XPath I get the same quote multiple times in the list. What am I doing wrong?"
The primary problem is due to the fact that XPath interprets /
at the beginning of an expression as reference to root document, doesn't matter the context element at which the expression is executed. You want to explicitly tell that you want to execute the expression on current context element (the one referenced by variable quote
) by adding a .
at the beginning, for example:
text = quote.xpath('.//span[@class="text"]/text()').extract_first()
Upvotes: 2