Reputation: 7824
In my Django application I have to deal with huge files. Instead of uploading them via the web app, the users may place them into a folder (called .dump
) on a Samba share and then can choose the file in the Django app to create a new model instance from it. The view looks roughly like this:
class AddDumpedMeasurement(View):
def get(self, request, *args, **kwargs):
filename = request.GET.get('filename', None)
dump_dir = os.path.join(settings.MEDIA_ROOT, settings.MEASUREMENT_DATA_DUMP_PATH)
in_file = os.path.join(dump_dir, filename)
if isfile(in_file):
try:
with open(in_file, 'rb') as f:
object = NCFile.objects.create(sample=sample, created_by=request.user, file=File(f))
return JsonResponse(data={'redirect': object.get_absolute_url()})
except:
return JsonResponse(data={'error': 'Couldn\'t read file'}, status=400)
else:
return JsonResponse(data={'error': 'File not found'}, status=400)
As MEDIA_ROOT
and .dump
are on the same Samba share (which is mounted by the web server), why is moving the file to its new location so slow? I would have expected it to be almost instantaneous. Is it because I open()
it and stream the bytes to the file object? If so, is there a better way to move the file to its correct destination and create the model instance?
Upvotes: 1
Views: 514
Reputation: 7824
Using a temporary file and replacing it with the original one allows one to use os.rename
which is fast.
tmp_file = NamedTemporaryFile()
object = NCFile.objects.create(..., file=File(tmp_file))
tmp_file.close()
if isfile(object.file.path):
os.remove(object.file.path)
new_relative_path = os.path.join(os.path.dirname(object.file.name), filename)
new_relative_path = object.file.storage.get_available_name(new_relative_path)
os.rename(in_file, os.path.join(settings.MEDIA_ROOT, new_relative_path))
object.file.name = new_relative_path
object.save()
Upvotes: 2
Reputation: 53774
Is it because I open() it and stream the bytes to the file object?
I would argue that it is so. A simple move operation on a file system object means just updating a record on the file systems internal database. That would indeed be instantaneous
opening a local file, reading it line by line is like a copy operation which could be slow depending on the file size. Additionally you are doing this at a very high level while an OS copy operation happens at a much lower level.
But that's not the real cause of the problem. You have said the files are on a samba share. Which I presume means that you have mounted a remote folder locally. Thus when you read the file in question you are actually fetching it over the network. That will be slower than a disk read. Then when you write the destination file, you are writing data over the network, again an operation that's slower than a disk write.
Upvotes: 1