Python Scrapy - Run Spider

Question

Running Python27 on a Windows machine ... Attempting to use Scrapy

following the basic Scrapy tutorial @ http://doc.scrapy.org/en/latest/intro/overview.html

I've created the following spider and saved it as Test2 @ C:\Python27\Scrapy

import scrapy


class StackOverflowSpider(scrapy.Spider):
name = 'stackoverflow'
start_urls = ['http://stackoverflow.com/questions?sort=votes']

def parse(self, response):
    for href in response.css('.question-summary h3 a::attr(href)'):
        full_url = response.urljoin(href.extract())
        yield scrapy.Request(full_url, callback=self.parse_question)

def parse_question(self, response):
    yield {
        'title': response.css('h1 a::text').extract_first(),
        'votes': response.css('.question .vote-count-post::text').extract_first(),
        'body': response.css('.question .post-text').extract_first(),
        'tags': response.css('.question .post-tag::text').extract(),
        'link': response.url,
    }

The next step tells me to run the spider using scrapy runspider stackoverflow_spider.py -o top-stackoverflow-questions.json

But I have no idea where to run that line of code.

I am used to running a print or a store to csv command at the end of my python file in order to retrieve results.

Sure this is an easy resolve but I'm not getting it .. Thanks in advance.

Dan H · Accepted Answer

You will need to execute the runspider command in whatever command line utility you are using, e.g. Cygwin, cmd etc.

That command will crate a file called top-stackoverflow-questions.json in the directory in which you run the command.

Python Scrapy - Run Spider

Answers (1)

Related Questions