szymond
szymond

Reputation: 1310

Lazy queries in Django using RawQuerySet

I am having problems understanding behavior of raw queries in Django. The problem is with number of queries to database while using RawQuerySet. It looks like each time I use RawQuerySet new database query is send. For example with this code:

    reviews = Reviews.objects.raw('select * from reviews rv [...]')

    count = len(list(reviews))
    avg_rating = 0.0

    if count > 0:
        for r in reviews:
            avg_rating = avg_rating + r.stars       
            avg_rating = avg_rating/float(count)

    avg_rating = avg_rating/float(count)

each use of "reviews" makes new, identical query. Why RawQuerySet is so lazy? It seems to me that once someone decides to use raw queries, he doesn't need additional "help" from Django. Am I missing something? Is there a way to make in such case only one query? Right know I decided to cast Reviews.objects.raw() to list() and it helps, but it still bothers me why is it done this way.

Upvotes: 1

Views: 1278

Answers (1)

bruno desthuilliers
bruno desthuilliers

Reputation: 77912

The problem comes from your list(reviews) expression - this force the RawQuerySet to try to build full model instances from the initial query one way or another (and possibly not in the most efficient way but that's another story). Without this line and assuming you're not accessing related objects, there would be only one query made, as you can see in this snippet:

>>> from survey.models import Question
>>> from django.db import connection
>>> import pprint
>>> connection.queries
[]
>>> raw = Question.objects.raw("select * from survey_question")
>>> for q in raw:
...     print q.id
... 
1
2
3
4
>>> connection.queries
[{'time': '0.000', 'sql': 'select * from survey_question'}]
>>> list(raw)
[<Question: Survey Essai1 (root) question #1 - Depuis combien de temps programmez vous ?>, <Question: Survey Essai1 (root) question #2 - Comment avez vous débuté ?>, <Question: Survey Essai1 (root) question #3 - Quel est votre niveau de formation>, <Question: Survey Essai1 (root) question #4 - Cette formation porte-t-elle sur l'informatique ?>]
>>> pprint.pprint(connection.queries)
[{'sql': 'select * from survey_question', 'time': '0.000'},
 {'sql': 'select * from survey_question', 'time': '0.000'},
 {'sql': 'SELECT `survey_survey`.`id`, `survey_survey`.`user_id`, `survey_survey`.`title`, `survey_survey`.`notes`, `survey_survey`.`description`, `survey_survey`.`instructions`, `survey_survey`.`date_created`, `survey_survey`.`starts_on`, `survey_survey`.`ends_on` FROM `survey_survey` WHERE `survey_survey`.`id` = 1 ',
  'time': '0.001'},
 {'sql': 'SELECT `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined` FROM `auth_user` WHERE `auth_user`.`id` = 1 ',
  'time': '0.001'},
 {'sql': 'SELECT `survey_survey`.`id`, `survey_survey`.`user_id`, `survey_survey`.`title`, `survey_survey`.`notes`, `survey_survey`.`description`, `survey_survey`.`instructions`, `survey_survey`.`date_created`, `survey_survey`.`starts_on`, `survey_survey`.`ends_on` FROM `survey_survey` WHERE `survey_survey`.`id` = 1 ',
  'time': '0.000'},
 {'sql': 'SELECT `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined` FROM `auth_user` WHERE `auth_user`.`id` = 1 ',
  'time': '0.000'},
 {'sql': 'SELECT `survey_survey`.`id`, `survey_survey`.`user_id`, `survey_survey`.`title`, `survey_survey`.`notes`, `survey_survey`.`description`, `survey_survey`.`instructions`, `survey_survey`.`date_created`, `survey_survey`.`starts_on`, `survey_survey`.`ends_on` FROM `survey_survey` WHERE `survey_survey`.`id` = 1 ',
  'time': '0.000'},
 {'sql': 'SELECT `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined` FROM `auth_user` WHERE `auth_user`.`id` = 1 ',
  'time': '0.000'},
 {'sql': 'SELECT `survey_survey`.`id`, `survey_survey`.`user_id`, `survey_survey`.`title`, `survey_survey`.`notes`, `survey_survey`.`description`, `survey_survey`.`instructions`, `survey_survey`.`date_created`, `survey_survey`.`starts_on`, `survey_survey`.`ends_on` FROM `survey_survey` WHERE `survey_survey`.`id` = 1 ',
  'time': '0.000'},
 {'sql': 'SELECT `auth_user`.`id`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`password`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`is_superuser`, `auth_user`.`last_login`, `auth_user`.`date_joined` FROM `auth_user` WHERE `auth_user`.`id` = 1 ',
  'time': '0.000'}]
>>> 

Now the point is: why do you use a RawQueryset at all for the above computation ?

I assume the first avg_rating = avg_rating/float(count) line the one in the loop - is a copy-paste error (if not I'd be curious to know more about your definition of "average"). Then you only need one query (no loop nor python computation required) to ask the database to do the computation. You can do this either using raw sql:

select avg(stars) from reviews;

or using the ORM's aggregation functions (cf https://docs.djangoproject.com/en/1.4/topics/db/aggregation/):

>>> from django.db.models import Avg
>>> Book.objects.all().aggregate(Avg('price'))

Upvotes: 1

Related Questions