janoliver
janoliver

Reputation: 7824

Why is setting a django FileField from existing file on the same partition slow?

In my Django application I have to deal with huge files. Instead of uploading them via the web app, the users may place them into a folder (called .dump) on a Samba share and then can choose the file in the Django app to create a new model instance from it. The view looks roughly like this:

class AddDumpedMeasurement(View):
    def get(self, request, *args, **kwargs):
        filename = request.GET.get('filename', None)

        dump_dir = os.path.join(settings.MEDIA_ROOT, settings.MEASUREMENT_DATA_DUMP_PATH)
        in_file = os.path.join(dump_dir, filename)

        if isfile(in_file):
            try:
                with open(in_file, 'rb') as f:
                    object = NCFile.objects.create(sample=sample, created_by=request.user, file=File(f))

                return JsonResponse(data={'redirect': object.get_absolute_url()})
            except:
                return JsonResponse(data={'error': 'Couldn\'t read file'}, status=400)
        else:
            return JsonResponse(data={'error': 'File not found'}, status=400)

As MEDIA_ROOT and .dump are on the same Samba share (which is mounted by the web server), why is moving the file to its new location so slow? I would have expected it to be almost instantaneous. Is it because I open() it and stream the bytes to the file object? If so, is there a better way to move the file to its correct destination and create the model instance?

Upvotes: 1

Views: 514

Answers (2)

janoliver
janoliver

Reputation: 7824

Using a temporary file and replacing it with the original one allows one to use os.rename which is fast.

tmp_file = NamedTemporaryFile()
object = NCFile.objects.create(..., file=File(tmp_file))
tmp_file.close()

if isfile(object.file.path):
    os.remove(object.file.path)

new_relative_path = os.path.join(os.path.dirname(object.file.name), filename)

new_relative_path = object.file.storage.get_available_name(new_relative_path)

os.rename(in_file, os.path.join(settings.MEDIA_ROOT, new_relative_path))
object.file.name = new_relative_path
object.save()

Upvotes: 2

e4c5
e4c5

Reputation: 53774

Is it because I open() it and stream the bytes to the file object?

I would argue that it is so. A simple move operation on a file system object means just updating a record on the file systems internal database. That would indeed be instantaneous

opening a local file, reading it line by line is like a copy operation which could be slow depending on the file size. Additionally you are doing this at a very high level while an OS copy operation happens at a much lower level.

But that's not the real cause of the problem. You have said the files are on a samba share. Which I presume means that you have mounted a remote folder locally. Thus when you read the file in question you are actually fetching it over the network. That will be slower than a disk read. Then when you write the destination file, you are writing data over the network, again an operation that's slower than a disk write.

Upvotes: 1

Related Questions