Using nth-child in Scrapy

Question

I'm trying to extract some html using the Python tool Scrapy.

My selector is as follows:

#navigation > nav > div.js-accordion-menu-wrapper > ul li:nth-child(n+5):nth-child(-n+10) > a::attr(href)

For some reason this isn't working at all. Specifically, it seems that 'nth-child(-n + x) just doesn't work. Like maybe Scrapy doesn't use it or allow it.

Can anyone confirm this?

paul trmbrth · Accepted Answer

Scrapy 1.2.1 with cssselect 1.0.0 seems to be working as expected.

Here's a sample scrapy shell session:

In [1]: selector = scrapy.Selector(text="""
   ...:     1
   ...:     2
   ...:     3
   ...:     4
   ...:     5
   ...:     6
   ...:     7
   ...:     8
   ...:     9
   ...:     10
   ...:     11
   ...:     12
   ...: """)

In [2]: selector.css('ul li:nth-child(n+5)').extract()
Out[2]: 
['5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12']

In [3]: selector.css('ul li:nth-child(n+5):nth-child(-n+10)').extract()
Out[3]: 
['5',
 '6',
 '7',
 '8',
 '9',
 '10']

I'm using:

$ scrapy version -v
Scrapy    : 1.2.1
lxml      : 3.6.4.0
libxml2   : 2.9.4
Twisted   : 16.5.0
Python    : 3.5.0+ (default, Oct 11 2015, 09:05:38) - [GCC 5.2.1 20151010]
pyOpenSSL : 16.2.0 (OpenSSL 1.0.2g  1 Mar 2016)
Platform  : Linux-4.4.0-47-generic-x86_64-with-Ubuntu-16.04-xenial

$ pip freeze | grep cssselect
cssselect==1.0.0

Using nth-child in Scrapy

Answers (1)

Related Questions