Reputation: 3243
this is my very first time attempting to scrape websites using CrawlSpider
, and to my sadness my spider is not returning any results. I am also new with python so if I make any obvious mistakes, please be patient with me.
Below is my code:
from scrapy.settings import Settings
from scrapy.settings import default_settings
from selenium import webdriver
from urlparse import urlparse
import csv
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy import log
default_settings.DEPTH_LIMIT = 3
class MySpider(CrawlSpider):
def __init__(self,url,ofile,csvWriter):
self.url=url
self.driver=webdriver.PhantomJS('/usr/local/bin/phantomjs')
self.ofile=ofile
self.csvWriter=csvWriter
self.name = "jin"
self.start_urls = [url]
self.rules = [Rule(SgmlLinkExtractor(), callback='parse_website', follow=True)]
def parse_website(self,response):
url=self.url
driver=self.driver
csvWriter=self.csvWriter
ofile=self.ofile
self.log('A response from %s just arrived!' % response.url)
driver.get(url)
htmlSiteUrl = self.get_site_url(driver)
htmlImagesList=self.get_html_images_list(driver,url)
def get_site_url(self,driver):
url = driver.current_url
return url
def get_html_images_list(self,driver,url):
listOfimages = driver.find_elements_by_tag_name('img')
return listOfimages
driver.close()
with open('/Users/hyunjincho/Desktop/BCorp_Websites.csv') as ifile:
website_batch= csv.reader(ifile, dialect=csv.excel_tab)
ofile=open('/Users/hyunjincho/Desktop/results.csv','wb')
csvWriter = csv.writer(ofile,delimiter=' ')
for website in website_batch:
url = ''.join(website)
aSpider=MySpider(url,ofile,csvWriter)
ofile.close()
Why is my spider not scraping anything? have I done something wrong in my code? can someone help me out?
Upvotes: 0
Views: 463
Reputation: 11396
you're not supposed to be launching the spider this way, see how it's done in the excellent scrapy tutorial
scrapy crawl jin
also, if you wish to read url/s from an external file see Scrapy read list of URLs from file to scrape?
last, output is done by creating items and handle them using configured pipelines if you wish to write them to a csv file use csv item exporter
Upvotes: 1