hansolo
hansolo

Reputation: 973

Python Scrapy - Run Spider

Running Python27 on a Windows machine ... Attempting to use Scrapy

following the basic Scrapy tutorial @ http://doc.scrapy.org/en/latest/intro/overview.html

I've created the following spider and saved it as Test2 @ C:\Python27\Scrapy

import scrapy


class StackOverflowSpider(scrapy.Spider):
name = 'stackoverflow'
start_urls = ['http://stackoverflow.com/questions?sort=votes']

def parse(self, response):
    for href in response.css('.question-summary h3 a::attr(href)'):
        full_url = response.urljoin(href.extract())
        yield scrapy.Request(full_url, callback=self.parse_question)

def parse_question(self, response):
    yield {
        'title': response.css('h1 a::text').extract_first(),
        'votes': response.css('.question .vote-count-post::text').extract_first(),
        'body': response.css('.question .post-text').extract_first(),
        'tags': response.css('.question .post-tag::text').extract(),
        'link': response.url,
    }

The next step tells me to run the spider using scrapy runspider stackoverflow_spider.py -o top-stackoverflow-questions.json

But I have no idea where to run that line of code.

I am used to running a print or a store to csv command at the end of my python file in order to retrieve results.

Sure this is an easy resolve but I'm not getting it .. Thanks in advance.

Upvotes: 0

Views: 604

Answers (1)

Dan H
Dan H

Reputation: 38

You will need to execute the runspider command in whatever command line utility you are using, e.g. Cygwin, cmd etc.

That command will crate a file called top-stackoverflow-questions.json in the directory in which you run the command.

Upvotes: 1

Related Questions