Reputation: 79
I have got a basic Django web application running on Heroku. I would like to add a spider to crawl some webs (e.g with Scrapy) based on a scheduled task ( e.g. via APScheduler ) to get some tables of Django databases loaded with collected data.
Does anybody know of documentation or examples for the basis to achieve this kind of integration? I find it very hard to figure it out.
Upvotes: 1
Views: 965
Reputation: 1620
I have not used Scrapy at all, but I'm actually working with APScheduler and it's very simple to use. So my first guess would be to use a BackgroundScheduler (inside your Django app) and add a job to it that would execute a callable "spider" periodically.
The thing here is how could you embed a Scrapy project inside your Django app so you can access one of its "spiders" and effectively use it as a callable in your scheduled job.
I'm maybe not helping much, but I'm just trying to give you some kickstart orientation. I'm pretty sure that if you carefully read the Scrapy's documentation you'll make your way.
Best.
Upvotes: 2