Reputation: 329
I am currently using Scrapy to gather data and output to a json file with
scrapy crawl foobar -a category=foo -o bar.json
Although this will append to the bar.json file rather than rewriting it. I would like to clear the file and rewrite over it, is this possible with a scrapy argument at all?
Or would I be required to clear it outwith scrapy first?
Many thanks.
Upvotes: 0
Views: 2230
Reputation: 470
You can also add the line open(LOG_FILE, "w+").close()
where LOG_FILE
is the name of your log file in your settings.py
. This opens, clears and closes it.
Upvotes: 0
Reputation: 1421
Overwriting feeds has been added to scrapy on Aug 17, 2020 with PR #4512. You can use -O
flag to overwrite and the final command will look like this:
scrapy crawl foobar -a category=foo -O bar.json
Upvotes: 0
Reputation: 33
Modify script like following:
class MySpider(Spider):
"""
Main crawler
"""
name = "mucrawler"
allowed_domains = ["sss.com"]
start_urls = ["https://www.sdsd/rov/"]
"Empty output file"
f = open("bar.json", 'w').close()
def parse(self, response):
titles = response.css("td.offer")
Upvotes: 1
Reputation: 257
You can remove the output file first, then start crawling for new data using;
rm output_file_name.csv; scrapy crawl spider_name -o output_file_name.csv
Upvotes: 2
Reputation: 28799
In addition to what @GHaijba has said, another solution would be creating your own pipeline and then you can apply whatever actions you want to any file.
For example, You can check if the file exists. Then, you can clear it or append date to it.
You can write to different files.
You can clear some of your items in the pipeline as well, since it is not a good practice to do that in your spider
Upvotes: 0
Reputation: 3691
Currently there is no automated solution for this issue, although an open issue exists at GitHub about this topic.
This means you have to remove the file prior launching your crawl.
One workaround would be to write an item exporter which removes the output file when it is initialized (and export the items if you are already there).
Upvotes: 0