Reputation: 973
Running Python27 on a Windows machine ... Attempting to use Scrapy
following the basic Scrapy tutorial @ http://doc.scrapy.org/en/latest/intro/overview.html
I've created the following spider and saved it as Test2 @ C:\Python27\Scrapy
import scrapy
class StackOverflowSpider(scrapy.Spider):
name = 'stackoverflow'
start_urls = ['http://stackoverflow.com/questions?sort=votes']
def parse(self, response):
for href in response.css('.question-summary h3 a::attr(href)'):
full_url = response.urljoin(href.extract())
yield scrapy.Request(full_url, callback=self.parse_question)
def parse_question(self, response):
yield {
'title': response.css('h1 a::text').extract_first(),
'votes': response.css('.question .vote-count-post::text').extract_first(),
'body': response.css('.question .post-text').extract_first(),
'tags': response.css('.question .post-tag::text').extract(),
'link': response.url,
}
The next step tells me to run the spider using
scrapy runspider stackoverflow_spider.py -o top-stackoverflow-questions.json
But I have no idea where to run that line of code.
I am used to running a print or a store to csv command at the end of my python file in order to retrieve results.
Sure this is an easy resolve but I'm not getting it .. Thanks in advance.
Upvotes: 0
Views: 604
Reputation: 38
You will need to execute the runspider command in whatever command line utility you are using, e.g. Cygwin, cmd etc.
That command will crate a file called top-stackoverflow-questions.json in the directory in which you run the command.
Upvotes: 1