RobertLD
RobertLD

Reputation: 77

Is there a way to compute this amount of data and still serve a responsive website?

Currently I am developing a django + react website, that will (I hope) serve a decent number of users. The project demo is mostly complete, and I am starting to think about the scale required to put this thing into production

The website essentially does three things:

  1. Grab data from external APIs (i.e. Twitter) for 50,000 unique keywords (the keywords dont change). This process happens every 30 minutes

  2. Run computation on all of the data, and save that computation to the database. Assume that the algorithm is as optimized as possible

  3. When a user visits the website it should serve a pretty graph/chart of all of the computational data per keyword

The issue being, this is far too intense a task to be done by the same application that serves the website, users would be waiting decades to see their data. My current plan is to have a separate API made that services the website with the data, that the website can then store in it's database. This separate API would process the data without fear of affecting users, and it should be able to finish its current computation in under 30 minutes, in time for the next round of data.

Can anyone help me understand how I can better equip my project to handle the scale? I'd love some ideas.

As a 4th year CS Student I figured it's time to put a real project out into the world and I am very excited about it and the progress I've made so far. My main worry is that the end users will be negatively effected, if I don't figure out some kind of pipeline to make this process happen.

To re-iterate my idea:

  1. Django + React - This is the forward facing website
  2. External API - Grabs the data off the internet and processes it, and waits for a GET request from the website

Is there a better way to do this? Or on the other hand am I severely overestimating how computationally heavy this is.

Edit: Including current research

Handling computationally intensive tasks in a Django webapp

Separation of business logic and data access in django

Upvotes: 1

Views: 56

Answers (1)

Avi Nehama
Avi Nehama

Reputation: 143

What you want is to have the computation task to be executed by a different process in the "background". The most straight-forward and popular solution is to use Celery, see here.

The Celery worker(s) - which performs the background task - can either run on the same machine as the web-application or (when scale becomes an issue), you can change the configuration so that it will run on an entirely different machine.

Upvotes: 1

Related Questions