Reputation: 12885
How can I in scrapy shell output results to a file, preferably csv?
I have a list of interesting elements in my bpython
shell, I can make item
of them. But how to redirect it to a file?
Upvotes: 2
Views: 2640
Reputation: 911
Once you are in the shell, you can do whatever you want to do using Python. That includes reading/writing data from/to a file using json or csv modules, for instance.
But, since we are talking about Scrapy and csv, let's use Scrapy's CsvItemExporter to get the job done:
from scrapy.exporters import CsvItemExporter
items = [{'one': 'data', 'two': 'more data'}, {'one': 'info', 'two': 'more info'}]
with open('data.csv', 'w') as f:
exporter = CsvItemExporter(file=f, fields_to_export=['one', 'two'])
exporter.start_exporting()
for i in items:
exporter.export_item(i)
exporter.finish_exporting()
That's a stripped down version of what Scrapy does when you add the -o
option to the crawl
command to save the output to a file.
Upvotes: 5
Reputation: 10450
Does the following answer your question?
https://doc.scrapy.org/en/latest/topics/feed-exports.html
One of the most frequently required features when implementing scrapers is being able to store the scraped data properly and, quite often, that means generating an “export file” with the scraped data (commonly called “export feed”) to be consumed by other systems. Scrapy provides this functionality out of the box with the Feed Exports, which allows you to generate a feed with the scraped items, using multiple serialization formats and storage backends.
https://doc.scrapy.org/en/latest/topics/feed-exports.html#topics-feed-format-csv
CSV
FEED_FORMAT: csv
Exporter used: CsvItemExporter
To specify columns to export and their order use FEED_EXPORT_FIELDS. Other feed exporters can also use this option, but it is important for CSV because unlike many other export formats CSV uses a fixed header.
Upvotes: -1