Quontas
Quontas

Reputation: 399

Serve scraped HTML data as an API using Django Rest Framework

I'm trying to build a public facing API that collects data through scraping HTML (the content of the page is what is important, not the pages themselves). I've elected to use Django-Rest-Framework as my backend. My question is: How exactly would I organize the structure of this project so that the Django ORM stores the scraped content and then it can be accessed using Django-Rest-Framework's API?

I've looked into Scrapy, but that seems less focused on content scraping and more focused on webcrawling. Additionally, it deploys in its own project, which makes conflicts with Django's bootstrapping.

Is my best bet just running cronjobs? That seems inelegant.

Upvotes: 0

Views: 1183

Answers (1)

Max Malysh
Max Malysh

Reputation: 31555

Use Celery to create asynchronous and periodic tasks.

If you need something lightweight for scraping, you can use BeautifulSoup. Here is a tutorial.

Overall, this is what you need to do:

  1. Start ordinary Django project.
  2. Add Celery to it.
  3. Write some scraping code.
  4. Call your custom scraping code from celery tasks. Save the scraped content to the database.
  5. Use Django-Rest-Framework to create an API which will serve the content from the database.

Upvotes: 1

Related Questions