platzhersh
platzhersh

Reputation: 1562

Using python async / await with django restframework

I am just upgrading an older project to Python 3.6, and found out that there are these cool new async / await keywords.

My project contains a web crawler, that is not very performant at the moment, and takes about 7 mins to complete. Now, since I have django restframework in place already to access data of my django application, I thought it would be nice to have a REST endpoint where I could start the crawler from remote with a simple POST request.

However, I don't want the client to synchronously wait for the crawler to complete. I just want to straight away send him the message that the crawler has been started and start the crawler in the background.

from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.conf import settings
from mycrawler import tasks

async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
    await tasks.update_all(deep_crawl, season, log_to_db)


@api_view(['POST', 'GET'])
def start(request):
    """
    Start crawling.
    """
    if request.method == 'POST':
        print("Crawler: start {}".format(request))

        deep = request.data.get('deep', False)
        season = request.data.get('season', settings.CURRENT_SEASON)

        # this should be called async
        update_all_async(season=season, deep_crawl=deep)

        return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
    else:
        return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
            "deep": "boolean",
            "season": "number"
        }}, status.HTTP_200_OK)

I have read some tutorials, also how to use the loops and stuff, but I don't really get it... Where should I start the loop in this case?

[EDIT] 20/10/2017:

I solved it using threading for now, since it really is a "fire and forget" task. However, I still would like to know how to achieve the same thing using async / await.

Here's my current solution:

import threading


@api_view(['POST', 'GET'])
def start(request):
    ...
    t = threading.Thread(target=tasks.update_all, args=(deep, season))
    t.start()
    ...

Upvotes: 9

Views: 11890

Answers (2)

castel
castel

Reputation: 178

This is possible in Django 3.1+, after introducing asynchronous support.

Regarding the asynchronous running loop, you can make use of it by running Django with uvicorn or any other ASGI server instead of gunicorn or other WSGI servers. The difference is that when using an ASGI server, there's already a running loop, while you would need to create one when using WSGI. With ASGI, you can simply define async functions directly under views.py or its View Classes's inherited functions.

Assuming you go with ASGI, you have multiple ways of achieving this, I'll describe a couple (other options could make use of asyncio.Queue for example):

  1. Make start() async

By making start() async, you can make direct use of the existing running loop, and by using asyncio.Task, you can fire and forget into the existing running loop. And if you want to fire but remember, you can create another Task to follow up on this one, i.e.:

from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.conf import settings
from mycrawler import tasks

import asyncio

async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
    await tasks.update_all(deep_crawl, season, log_to_db)

async def follow_up_task(task: asyncio.Task):
    await asyncio.sleep(5) # Or any other reasonable number, or a finite loop...
    if task.done():
        print('update_all task completed: {}'.format(task.result()))
    else:
        print('task not completed after 5 seconds, aborting')
        task.cancel()


@api_view(['POST', 'GET'])
async def start(request):
    """
    Start crawling.
    """
    if request.method == 'POST':
        print("Crawler: start {}".format(request))

        deep = request.data.get('deep', False)
        season = request.data.get('season', settings.CURRENT_SEASON)

        # Once the task is created, it will begin running in parallel
        loop = asyncio.get_running_loop()
        task = loop.create_task(update_all_async(season=season, deep_crawl=deep))

        # Fire up a task to track previous down
        loop.create_task(follow_up_task(task))

        return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
    else:
        return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
            "deep": "boolean",
            "season": "number"
        }}, status.HTTP_200_OK)
  1. async_to_sync

Sometimes you can't just have an async function to route the request to in the first place, as it happens with DRF (as of today). For this, Django provides some useful async adapter functions, but be aware that switching from sync to async context or vice versa, comes with a small performance penalty of approximately 1ms. Note that this time, the running loop as gathered in the update_all_sync function instead:

from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.conf import settings
from mycrawler import tasks

import asyncio
from asgiref.sync import async_to_sync

@async_to_sync
async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
    #We can use the running loop here in this use case
    loop = asyncio.get_running_loop()
    task = loop.create_task(tasks.update_all(deep_crawl, season, log_to_db))
    loop.create_task(follow_up_task(task))

async def follow_up_task(task: asyncio.Task):
    await asyncio.sleep(5) # Or any other reasonable number, or a finite loop...
    if task.done():
        print('update_all task completed: {}'.format(task.result()))
    else:
        print('task not completed after 5 seconds, aborting')
        task.cancel()


@api_view(['POST', 'GET'])
def start(request):
    """
    Start crawling.
    """
    if request.method == 'POST':
        print("Crawler: start {}".format(request))

        deep = request.data.get('deep', False)
        season = request.data.get('season', settings.CURRENT_SEASON)

        # Make update all "sync"
        sync_update_all_sync = async_to_sync(update_all_async)
        sync_update_all_sync(season=season, deep_crawl=deep)

        return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
    else:
        return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
            "deep": "boolean",
            "season": "number"
        }}, status.HTTP_200_OK)

In both cases, the function will quickly return the 200, but technically the 2nd option is slower.

IMPORTANT: When using Django, it is common to have DB operations involved in these async operations. DB operations in Django can only be synchronous, at least for now, so you will have to consider this in asynchronous contexts. sync_to_async() becomes very handy for these cases.

Upvotes: 8

Eduardo Balbinot
Eduardo Balbinot

Reputation: 139

In my opinion, you should have a look at celery, which is a great tool specially designed for asynchronous tasks. It supports Django and it's very useful when you don't want the user to wait for long operations on the server. Each task that runs in the background receives a task_id, which can help you if you want to create another service that, given a task_id, returns whether a specific task has succeded or not, or also how much of it has been done so far.

Upvotes: 5

Related Questions