RS7
RS7

Reputation: 2361

Celery - error handling and data storage

I'm trying to better understand common strategies regarding results and errors in Celery.

I see that results have statuses/states and stores results if requested -- when would I use this data? Should error handling and data storage be contained within the task?

Here is a sample scenario, in case it helps better understand my objective:

I have a geocoding task that goeocodes user addresses. If the task fails or succeeds, I'd like to update a field in the database letting the user know. (Error handling) On success I'd like the geocoded data to be inserted into the database (Data storage)

What approach should take?

Upvotes: 2

Views: 2404

Answers (1)

Benjamin White
Benjamin White

Reputation: 779

Let me preface this by saying that I'm still getting a feel for Celery myself. That being said, I have some general inclinations about how I'd go about tackling this, and since no one else has responded, I'll give it a shot.

Based on what you've written, a relatively simple (though I suspect non-optimized) solution is to follow the broad contours of the blog comment spam task example from the documentation.

app.models.py

class Address(models.Model):

  GEOCODE_STATUS_CHOICES = (
    ('pr', 'pre-check'),
    ('su', 'success'), 
    ('fl', 'failed'),
  )

  address = models.TextField()
  ...
  geocode = models.TextField()
  geocode_status = models.CharField(max_length=2, 
                                    choices=GEOCODE_STATUS_CHOICES, 
                                    default='pr')

class AppUser(models.Model):
  name = models.CharField(max_length=100)
  ...
  address = models.ForeignKey(Address)

app.tasks.py

  from celery import task
  from app.models import Address, AppUser
  from some_module import geocode_function #assuming this returns a string

  @task()
  def get_geocode(appuser_pk):
    user = AppUser.objects.get(pk=appuser_pk)
    address = user.address

    try:
      result = geocode_function(address.address)
      address.geocode = result
      address.geocode_status = 'su' #set address object as successful
      address.save()
      return address.geocode  #this is optional -- your task doesn't have to return anything
                                 on the other hand, you could also choose to decouple the geo-
                                 code function from the database update for the object instance.   
                                 Also, if you're thinking about chaining tasks together, you             
                                 might think about if it's advantageous to pass a parameter as 
                                 an input or partial input into the child task.

      except Exception as e:     
        address.geocode_status = 'fl' #address object fails
        address.save()
        #do something_else()
        raise  #re-raise the error, in case you want to trigger retries, etc

app.views.py

from app.tasks import *
from app.models import *
from django.shortcuts import get_object_or_404

    def geocode_for_address(request, app_user_pk):
      app_user = get_object_or_404(AppUser, pk=app_user_pk)

     ...etc.etc.  --- **somewhere calling your tasks with appropriate args/kwargs

I believe this meets the minimal requirements you've outlined above. I've intentionally left the view undeveloped since I don't have a sense of how exactly you want to trigger it. It sounds like you also may want some sort of user notification when their address can't be geocoded ("I'd like to update a field in a database letting a user know"). Without knowing more about the specifics of this requirement, I would it sounds like something that might be best accomplished in your html templates (if instance.attribute value is X, display q in template) or by using a django.signals (set up a signal for when a user.address.geocode_status switches to failure -- say, by emailing the user to let them know, etc.).

In the comments to the code above, I mentioned the possibility of decoupling and chaining the component parts of the get_geocode task above. You could also think about decoupling the exception handling from the get_geocode task, by writing a custom error handler task, and using the link_error parameter (for instance., add.apply_async((2, 2), link_error=error_handler.s(), where error_handler has been defined as a task in app.tasks.py ). Also, whether you choose to handle errors via the main task (get_geocode) or via a linked error handler, I would think that you would want to get much more specific about how to handle different sorts of errors (e.g., do something with connection errors different than with address data being incorrectly formatted).

I suspect there are better approaches, and I'm just beginning to understand how inventive you can get by chaining tasks, using groups and chords, etc. Hope this helps at least get you thinking about some of the possibilities. I'll leave it to others to recommend best practices.

Upvotes: 1

Related Questions