Benjamin
Benjamin

Reputation: 3467

Large file upload to Django Rest Framework

I try to upload a big file (4GB) with a PUT on a DRF viewset.

During the upload my memory is stable. At 100%, the python runserver process takes more and more RAM and is killed by the kernel. I have a logging line in the put method of this APIView but the process is killed before this method call.

I use this setting to force file usage FILE_UPLOAD_HANDLERS = ["django.core.files.uploadhandler.TemporaryFileUploadHandler"]

Where does this memory peak comes from? I guess it try to load the file content in memory but why (and where)?

More information:

Upvotes: 6

Views: 3171

Answers (2)

Wonskcalb
Wonskcalb

Reputation: 383

TL;DR:

Neither a DRF nor a Django issue, it's a 2.5 years known Daphne issue. The solution is to use uvicorn, hypercorn, or something else for the time being.

Explanations

What you're seeing here is not coming from Django Rest Framework as:

The fact that you're mentioning Daphne reminds me of this SO answer which mentions a similar problem and points to a code that Daphne doesn't handle large file uploads as it loads the whole body in RAM before passing it to the view. (The code is still present in their master branch at the time of writing)

You're seeing the same behavior with runserver because when installed, Daphne replaces the initial runserver command with itself to provide WebSockets support for dev purposes.

To make sure that it's the real culprit, try to disable Channels/run the default Django runserver and see for yourself if your app is killed by the OOM Killer.

Upvotes: 6

Caio Kretzer
Caio Kretzer

Reputation: 169

I don't know if it works with django rest, but you can try to chunk de file.

        [...]
        anexo_files = request.FILES.getlist('anexo_file_'+str(k))
        index = 0
        for file in anexo_files:
            index = index + 1
            extension = os.path.splitext(str(file))[1]
            nome_arquivo_anexo = 'media/uploads/' + os.path.splitext(str(file))[0] + "_" + str(index) + datetime.datetime.now().strftime("%m%d%Y%H%M%S") + extension
            handle_uploaded_file(file, nome_arquivo_anexo)
            
            AnexoProjeto.objects.create(
                projeto=projeto,
                arquivo_anexo = nome_arquivo_anexo 
            )
        [...]

Where handle_uploaded_file is

def handle_uploaded_file(f, nome_arquivo):
    with open(nome_arquivo, 'wb+') as destination:
        for chunk in f.chunks():
            destination.write(chunk)

Upvotes: 0

Related Questions