When using scrapy, crawled 0 pages (at 0 pages/min) scraped 0 items (at 0 items/min)

Question

I just began to learn Python and Scrapy.

My first project is to crawl information on a website containing web security information. But when I run that using cmd, it says that

crawled 0 pages (at 0 pages/min) scraped 0 items (at 0 items/min)

and nothing seems to come out. I'd be grateful if someone kind could solve my problem.

Following is my spider file:

from ssl_abuse.items import SslAbuseItem
import scrapy

class SslAbuseSpider(scrapy.Spider):
    name='ssl_abuse'
    start_urls=['https://sslbl.abuse.ch/']
    def parse(self, response):
        for sel in response.xpath('/table//tr'):
            item=SslAbuseItem()
            item['date']=sel.xpath('/td/text()')[0].extract()
            item['name']=sel.xpath('/td/text()')[2].extract()
            item['type']=sel.xpath('/td/text()')[3].extract()
            yield item

Following is the website I'm about to crawl:

https://sslbl.abuse.ch/

I wish to get all element of that chart except SHA1 fingerprint..

After I changed my code like Will said, there is an error coming up:

`2017-01-04 09:31:40 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-01-04 09:31:40 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-01-04 09:31:42 [scrapy.core.engine] DEBUG: Crawled (200)  (referer: None)
2017-01-04 09:31:52 [scrapy.core.engine] DEBUG: Crawled (200)  (referer: None)
2017-01-04 09:31:53 [scrapy.core.scraper] ERROR: Spider error processing  (referer: None)
Traceback (most recent call last):
  File "c:\python27\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
    yield next(it)
  File "c:\python27\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
    for x in result:
  File "c:\python27\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 22, in 
    return (_set_referer(r) for r in result or ())
  File "c:\python27\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in 
    return (r for r in result or () if _filter(r))
  File "c:\python27\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in 
    return (r for r in result or () if _filter(r))
  File "V:\work\ssl_abuse\ssl_abuse\spiders\ssl_abuse_spider.py", line 11, in parse
    item['date']=sel.xpath('/td/text()')[0].extract()
  File "c:\python27\lib\site-packages\parsel\selector.py", line 58, in __getitem__
    o = super(SelectorList, self).__getitem__(pos)
IndexError: list index out of range`

My updated code: `

from ssl_abuse.items import SslAbuseItem
import scrapy
class SslAbuseSpider(scrapy.Spider):
    name='ssl_abuse'
    start_urls=['https://sslbl.abuse.ch/']
    def parse(self, response):
        for sel in response.xpath('//table//tr'):
            item=SslAbuseItem()
            item['date']=sel.xpath('/td/text()')[0].extract()
            item['name']=sel.xpath('/td/text()')[2].extract()
            item['type']=sel.xpath('/td/text()')[3].extract()
            yield item`

Will · Accepted Answer

I did a quick test with scrapy shell. It seems there is something wrong with the xpath locator. The response.body looks like:

...

...
the first item is the table head, real content get started from the second row.
For example:
# scrapy shell 'https://sslbl.abuse.ch/'
>>> rows = response.xpath('//table//tr')
>>> head = rows[0]

>>> head.xpath('th/text()').extract()
[u'Listing date (UTC)', u'SHA1 fingerprint', u'Common Name', u'Listing reason']

>>> td1 = rows[1]
>>> td1.xpath('td')
[2016-12-30 07:54:19'>, 

Listing date (UTC) SHA1 fingerprint Common Name Listing reason
2016-12-30 07:54:19 1d05c6fef14d2671d759a05b496464b831c650e8 host/emailAddress=web@host Gootkit C&C
2016-12-28 10:03:54 a82dd258544acf0a109296493421262397741db7 google.com/emailAddress=web@google.com Gootkit C&C
2016-12-27 19:19:35 df6f665e91d2fe8a338f778ad53c1921fcab3d8f CN=p.fmsacademy.it Gozi MITM

When using scrapy, crawled 0 pages (at 0 pages/min) scraped 0 items (at 0 items/min)

Answers (1)

Related Questions