Arctelix
Arctelix

Reputation: 4576

Quantity of database query's vs application memory performance using Django

If I need a total of all objects in a query set as well as a slice of filed values from those objects, which option would be better considering speed and application memory use (I am using a PostgreSQL backend):

Option a:

def get_data():
    queryset = MyObject.objects.all()
    total_objects = queryset.count()
    thumbs = queryset[:5].values_list('thumbnail', flat=True)
    return {total_objects:total_objects, thumbs:thumbs}

Option b:

def get_data():
    objects = list(MyObject.objects.all())
    total_objects = len(objects)
    thumbs = [o.thumbnail for o in objects[:5]]
    return {total_objects:total_objects, thumbs:thumbs}

If I understand things correctly, and certainly correct me if I am wrong:

Option a: It will hit the database two times and will result in only total_objects = integer and thumbs = list of strings in memory.

Option b: It will hit the database one time and will result in a list of all objects and all their filed data + option a items in memory.

Considering these options and that there are potentially millions of instances of MyObject: Is the speed of one data base hit (options a) preferable to the memory consumption of a single data base hit (option b)?

My priority is for overall speed in returning the data, but I am concerned about the larger memory consumption slowing things down even more than the extra database hit.

Upvotes: 0

Views: 165

Answers (1)

yuvi
yuvi

Reputation: 18427

Using SQL is the fastest method and will always beat the Python equivalent, even if it hits the database more. The difference is negligible in comparison. Remember, that's what SQL is meant to do - be fast and efficient.

Anyway, running a thousand loops using timeit, these are the results:

In [8]: %timeit get_data1() # Using ORM
1000 loops, best of 3: 628 µs per loop

In [9]: %timeit get_data2() # Using python
1000 loops, best of 3: 1.54 ms per loop

As you can see, the first method takes 628 microseconds per loop, while the second one takes 1.54 milliseconds. That's almost 2.5 times as much! A clear winner.

I used an SQLite database with only 100 objects in it (I used autofixture to spam the models). I'm guessing PostgreSQL will return different results, but I am still in favor of the first one.

Upvotes: 1

Related Questions