Crawling local files with Scrapy without an active project?

Question

Is it possible to crawl local files with Scrapy 0.18.4 without having an active project? I've seen this answer and it looks promising, but to use the crawl command you need a project.

Alternatively, is there an easy/minimalist way to set up a project for an existing spider? I have my spider, pipelines, middleware, and items defined in one Python file. I've created a scrapy.cfg file with only the project name. This lets me use crawl, but since I don't have a spiders folder Scrapy can't find my spider. Can I point Scrapy to the right directory, or do I need to split my items, spider, etc. up into separate files?

[edit] I forgot to say that I'm running the spider using Crawler.crawl(my_spider) - ideally I'd still like to be able to run the spider like that, but can run it in a subprocess from my script if that's not possible.

Turns out the suggestion in the answer I linked does work - http://localhost:8000 can be used as a start_url, so there's no need for a project.

alecxe · Accepted Answer

As an option, you can run Scrapy from a script, here is a self-contained example script and the overview of the approach used.

This doesn't mean you have to put everything in one file. You can still have spider.py, items.py, pipelines.py - just import them correctly in the script you start crawling from.

Crawling local files with Scrapy without an active project?

Answers (1)

Related Questions