Reputation: 1562
I am just upgrading an older project to Python 3.6, and found out that there are these cool new async / await keywords.
My project contains a web crawler, that is not very performant at the moment, and takes about 7 mins to complete. Now, since I have django restframework in place already to access data of my django application, I thought it would be nice to have a REST endpoint where I could start the crawler from remote with a simple POST request.
However, I don't want the client to synchronously wait for the crawler to complete. I just want to straight away send him the message that the crawler has been started and start the crawler in the background.
from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.conf import settings
from mycrawler import tasks
async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
await tasks.update_all(deep_crawl, season, log_to_db)
@api_view(['POST', 'GET'])
def start(request):
"""
Start crawling.
"""
if request.method == 'POST':
print("Crawler: start {}".format(request))
deep = request.data.get('deep', False)
season = request.data.get('season', settings.CURRENT_SEASON)
# this should be called async
update_all_async(season=season, deep_crawl=deep)
return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
else:
return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
"deep": "boolean",
"season": "number"
}}, status.HTTP_200_OK)
I have read some tutorials, also how to use the loops and stuff, but I don't really get it... Where should I start the loop in this case?
[EDIT] 20/10/2017:
I solved it using threading for now, since it really is a "fire and forget" task. However, I still would like to know how to achieve the same thing using async / await.
Here's my current solution:
import threading
@api_view(['POST', 'GET'])
def start(request):
...
t = threading.Thread(target=tasks.update_all, args=(deep, season))
t.start()
...
Upvotes: 9
Views: 11890
Reputation: 178
This is possible in Django 3.1+, after introducing asynchronous support.
Regarding the asynchronous running loop, you can make use of it by running Django with uvicorn
or any other ASGI server instead of gunicorn
or other WSGI servers.
The difference is that when using an ASGI server, there's already a running loop, while you would need to create one when using WSGI. With ASGI, you can simply define async
functions directly under views.py
or its View Classes's inherited functions.
Assuming you go with ASGI, you have multiple ways of achieving this, I'll describe a couple (other options could make use of asyncio.Queue
for example):
start()
asyncBy making start()
async, you can make direct use of the existing running loop, and by using asyncio.Task
, you can fire and forget into the existing running loop. And if you want to fire but remember, you can create another Task
to follow up on this one, i.e.:
from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.conf import settings
from mycrawler import tasks
import asyncio
async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
await tasks.update_all(deep_crawl, season, log_to_db)
async def follow_up_task(task: asyncio.Task):
await asyncio.sleep(5) # Or any other reasonable number, or a finite loop...
if task.done():
print('update_all task completed: {}'.format(task.result()))
else:
print('task not completed after 5 seconds, aborting')
task.cancel()
@api_view(['POST', 'GET'])
async def start(request):
"""
Start crawling.
"""
if request.method == 'POST':
print("Crawler: start {}".format(request))
deep = request.data.get('deep', False)
season = request.data.get('season', settings.CURRENT_SEASON)
# Once the task is created, it will begin running in parallel
loop = asyncio.get_running_loop()
task = loop.create_task(update_all_async(season=season, deep_crawl=deep))
# Fire up a task to track previous down
loop.create_task(follow_up_task(task))
return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
else:
return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
"deep": "boolean",
"season": "number"
}}, status.HTTP_200_OK)
Sometimes you can't just have an async
function to route the request to in the first place, as it happens with DRF (as of today).
For this, Django provides some useful async
adapter functions, but be aware that switching from sync to async context or vice versa, comes with a small performance penalty of approximately 1ms. Note that this time, the running loop as gathered in the update_all_sync
function instead:
from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.conf import settings
from mycrawler import tasks
import asyncio
from asgiref.sync import async_to_sync
@async_to_sync
async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
#We can use the running loop here in this use case
loop = asyncio.get_running_loop()
task = loop.create_task(tasks.update_all(deep_crawl, season, log_to_db))
loop.create_task(follow_up_task(task))
async def follow_up_task(task: asyncio.Task):
await asyncio.sleep(5) # Or any other reasonable number, or a finite loop...
if task.done():
print('update_all task completed: {}'.format(task.result()))
else:
print('task not completed after 5 seconds, aborting')
task.cancel()
@api_view(['POST', 'GET'])
def start(request):
"""
Start crawling.
"""
if request.method == 'POST':
print("Crawler: start {}".format(request))
deep = request.data.get('deep', False)
season = request.data.get('season', settings.CURRENT_SEASON)
# Make update all "sync"
sync_update_all_sync = async_to_sync(update_all_async)
sync_update_all_sync(season=season, deep_crawl=deep)
return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
else:
return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
"deep": "boolean",
"season": "number"
}}, status.HTTP_200_OK)
In both cases, the function will quickly return the 200, but technically the 2nd option is slower.
IMPORTANT: When using Django, it is common to have DB operations involved in these async operations. DB operations in Django can only be synchronous, at least for now, so you will have to consider this in asynchronous contexts.
sync_to_async()
becomes very handy for these cases.
Upvotes: 8
Reputation: 139
In my opinion, you should have a look at celery, which is a great tool specially designed for asynchronous tasks. It supports Django and it's very useful when you don't want the user to wait for long operations on the server. Each task that runs in the background receives a task_id, which can help you if you want to create another service that, given a task_id, returns whether a specific task has succeded or not, or also how much of it has been done so far.
Upvotes: 5