hoadlck
hoadlck

Reputation: 13

Django: Usage Of select_related() And Execution Time Performance

I am new to Django and databases, so I am trying to get some perspective on performance. Specifically, I want to understand if select_related() is working the way I think it does.

Here is a simplified version of my model:

class User(models.Model):
    short = models.CharField(max_length=255)
    name = models.CharField(max_length=255)

class Comment(models.Model):
    title = models.CharField(max_length=255)
    content = models.TextField()
    short = models.ForeignKey(User)

In my template, I need to show the user's short name next to the comment title. My test database size has 1000 users, and 19000 comments.

At first, I was retrieving the list as follows:

cmt_list = Comment.objects.all().order_by('title')

My template was accessing the short foreign key relationship, which was causing extra hits to the database. Retrieving all of the data took ~30s. Which was pretty horrible.

I knew that this was much faster is raw SQL, and I could not figure out how to do this via the Django ORM. So, I used the low level interface:

from django.db import connection

cursor = connection.cursor()
cursor.execute("SELECT app_comment.title,app_user.short       \
                       FROM app_comment,app_user              \
                       WHERE app_comment.short_id=app_user.id \
                       ORDER BY app_comment.title"
              )
raw_list = cursor.fetchall()
cmt_list = [ {"title":entry[0], "short":entry[1]} for entry in raw_list]

Retrieving all of the data took ~233ms. That is what I was expecting!

After reading more of the docs, I discovered the select_related() feature in Django. So, I tried that:

cmt_list = Comment.objects.all().select_related('short__short').order_by('title')

Retrieving all of the data took ~1.3s. Much better than the original, but still pretty slow compared to raw SQL queries.

The Question

What am I doing wrong in the way that I am using select_related()/the Django ORM? I understand that the ORM will add some overhead, but 1.3s versus 233ms seems excessive. Or, is this expected, and I just need to get over it?

How would I fashion a query using the ORM that would be equivalent to the raw SQL query I made? Given what I understand about select_related(), my raw SQL query and my Django one should be roughly equivalent. (The Django query will be grabbing more content, but for my test data there would not be much extra retrieved.)

Upvotes: 1

Views: 1318

Answers (2)

hoadlck
hoadlck

Reputation: 13

Following the suggestions in the answer, I updated my query using the Django ORM. I had to add a little list comprehension to give the same input to my template. Here is the result:

    cmt_list = [ {"title":entry["title"], "short":entry["short__short"]} 
                 for entry in 
                 Comment.objects.order_by('title').values("title", 'short__short')
               ]

The time to do the query was ~280ms, which is much more in-line with what I can get from using raw SQL.

To determine what the raw SQL was behind the scenes in the Django ORM, I used the following print statement...

print(Comment.objects.order_by('title').values("title", 'short__short').query)

The results were as follows:

SELECT "app_comment"."title", "app_user"."short" 
FROM "app_comment" INNER JOIN "app_user" 
ON ( "app_comment"."short_id" = "app_user"."id" ) 
ORDER BY "app_comment"."title" ASC

Which was the same as what I did raw.

This solution did not require select_related() at all.

Upvotes: 0

knbk
knbk

Reputation: 53729

Actually, the ORM can cause quite a bit of overhead. The following statement produce identical queries (you can check it by comparing str(cmt_list.query)):

cmt_list1 = Comment.objects.all()
cmt_list2 = Comment.objects.values('id', 'title', 'content', 'short_id')

However, a simple test using timeit.timeit (with a different model in my local project) shows the second method is a bit more than twice as fast as the first method.

That, and don't forget that a TextField is a huge amount of data compared to an int or varchar(255) column. I'm sure that if you fetch the content column in your raw sql query, the figures will be a lot closer.

Upvotes: 1

Related Questions