dcarlo56ave
dcarlo56ave

Reputation: 253

Scrapy CSV header row format with Multiple spiders and CSVItemExporter

I am running four spiders and exporting the data into one csv file. However, when the second spider runs it and gets data it creates the same row with column names. I tried to format the row header with FEED_EXPORT_FIELDS but that did not work.

What I am looking to do is have 1 row header and populate all the data below it for each spider. The image shows the error and below I gave an example of what I am looking to accomplish.

I did look at CsvItemExporter but am not clear how I would get the data from all four spiders and export the data. I have read over the documentation but still don't see how I would tie all this together.

TMP_FILE = os.path.join(os.path.dirname(sys.modules['products'].__file__), 'tmp/allproducts.csv')
FIELDS = ['url', 'company', 'location', 'price', 'make', 'model', 'year', 'height']

process = CrawlerProcess({
    'FEED_FORMAT': 'csv',
    'FEED_URI': TMP_FILE,
    'FEED_EXPORT_FIELDS': FIELDS,
})
process.crawl(Spider1)
process.crawl(Spider2)
process.start()

enter image description here

Upvotes: 2

Views: 657

Answers (1)

Guillaume
Guillaume

Reputation: 1879

You have several options:

  1. Each spider write on its own file, and then you combine everything as the end in a separate process.
  2. Instead of writing into a file, the spiders have an item pipeline that writes into a messaging queue, and a separate process consumes the messages from the queue and write into a single CSV file.

Upvotes: 1

Related Questions