zubmic
zubmic

Reputation: 77

How to save output from Scrapy into file or database

I'm working on a script which goes to specific website and collects info, after gathering information it should save all of it into a file (it would be even better if it saved it to database). I read about FEED EXPORT and Pipelines but I'm newbie with Python and Scrapy so I haven't found a solution yet.

Can anyone explain to me how to use Feed Export or Pipelines? I read documentation but it's not clear to me. Here's my code so far:

import scrapy



class BrickSetSpider(scrapy.Spider):
    name = "brickset_spider"
    start_urls = ['http://brickset.com/sets/year-2016']

    def parse(self, response):
        SET_SELECTOR = '.set'
        for brickset in response.css(SET_SELECTOR):

            NAME_SELECTOR = 'h1 a ::text'
            PIECES_SELECTOR = './/dl[dt/text() = "Pieces"]/dd/a/text()'
            MINIFIGS_SELECTOR = './/dl[dt/text() = "Minifigs"]/dd[2]/a/text()'
            IMAGE_SELECTOR = 'img ::attr(src)'
            yield {
                'name': brickset.css(NAME_SELECTOR).extract_first(),
                'pieces': brickset.xpath(PIECES_SELECTOR).extract_first(),
                'minifigs': brickset.xpath(MINIFIGS_SELECTOR).extract_first(),
                'image': brickset.css(IMAGE_SELECTOR).extract_first(),
            }

        NEXT_PAGE_SELECTOR = '.next a ::attr(href)'
        next_page = response.css(NEXT_PAGE_SELECTOR).extract_first()
        if next_page:
            yield scrapy.Request(
                response.urljoin(next_page),
                callback=self.parse
            )

Learning Python is so much fun to me but I got stuck with this and I really need to make that script work. Thank you in advance for any suggestions and help.

Cheers!

Upvotes: 0

Views: 7744

Answers (2)

Singletoned
Singletoned

Reputation: 5129

You should just be able to set FEED_FORMAT and FEED_URI in your settings file. You don't particularly need to bother with pipelines.

Something like (in settings.py):

FEED_FORMAT = "csv"
FEED_URI = "./myfile.csv"

Upvotes: 3

Macondo
Macondo

Reputation: 947

You can output your results to a CSV file.

scrapy crawl nameofspider -o file.csv

Upvotes: 0

Related Questions