Scrapy CrawlSpider not scraping anything

Question

this is my very first time attempting to scrape websites using CrawlSpider, and to my sadness my spider is not returning any results. I am also new with python so if I make any obvious mistakes, please be patient with me.

Below is my code:

from scrapy.settings import Settings
from scrapy.settings import default_settings 
from selenium import webdriver
from urlparse import urlparse
import csv    
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy import log

default_settings.DEPTH_LIMIT = 3

class MySpider(CrawlSpider):
    def __init__(self,url,ofile,csvWriter):
        self.url=url
        self.driver=webdriver.PhantomJS('/usr/local/bin/phantomjs')
        self.ofile=ofile
        self.csvWriter=csvWriter
        self.name = "jin"
        self.start_urls = [url]
        self.rules = [Rule(SgmlLinkExtractor(), callback='parse_website', follow=True)]

    def parse_website(self,response):
        url=self.url
        driver=self.driver
        csvWriter=self.csvWriter
        ofile=self.ofile
        self.log('A response from %s just arrived!' % response.url)
        driver.get(url)
        htmlSiteUrl = self.get_site_url(driver)
        htmlImagesList=self.get_html_images_list(driver,url)

    def get_site_url(self,driver):
        url = driver.current_url
        return url

    def get_html_images_list(self,driver,url):
        listOfimages = driver.find_elements_by_tag_name('img') 
        return listOfimages
        driver.close()

with open('/Users/hyunjincho/Desktop/BCorp_Websites.csv') as ifile:  
   website_batch= csv.reader(ifile, dialect=csv.excel_tab)  
   ofile=open('/Users/hyunjincho/Desktop/results.csv','wb')
   csvWriter = csv.writer(ofile,delimiter=' ')
   for website in website_batch: 
      url = ''.join(website)         
      aSpider=MySpider(url,ofile,csvWriter)
   ofile.close()

Why is my spider not scraping anything? have I done something wrong in my code? can someone help me out?

Guy Gavriely · Accepted Answer

you're not supposed to be launching the spider this way, see how it's done in the excellent scrapy tutorial

scrapy crawl jin

also, if you wish to read url/s from an external file see Scrapy read list of URLs from file to scrape?

last, output is done by creating items and handle them using configured pipelines if you wish to write them to a csv file use csv item exporter

Scrapy CrawlSpider not scraping anything

Answers (1)

Related Questions