Reputation:
I have a rather performance related question about django queries.
Say I have a table of employees with 10,000 records. Now If I'm looking to select 5 random employees that are of age greater than or equal to 20, let's say some 5,500 employees are 20 or older. The django query would be:
Employee.objects.filter(age__gte=20).order_by('?')[:5]
and the raw counterpart of this query in mysql will be:
SELECT * FROM `database`.`employee`
WHERE `employee`.`age` >= 20
ORDER BY RAND ()
LIMIT 5;
From the looks of django query the database first returns the 5,500 records, then python sorts these records on random or whatever order we select and a chunk of first five records is returned whereas the raw query will return only five records from the database directly.
My question is that is there any performance difference between both the queries? If so which one is better and why?
Upvotes: 5
Views: 1328
Reputation: 53734
Any performance issue that you are seeing is not because of django. As Shang Wang as already pointed out the query executed by Django is exactly the same as the raw query. The trouble is with the rand() call.
As has been explained elsewhere including some StackOveflow answers like this one: https://stackoverflow.com/a/6911022/267540 trouble is caused by the fact that a random value has to be calculated for each row. Luckily you have a WHERE clause that cuts down the number of rows. BUt as the data grows your query will gradually become slower and slower.
This link has several different suggestions on how to overcome this problem http://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/
Upvotes: 0
Reputation: 25539
I did a quick check on my existing project:
queryset = BlahModel.objects.order_by('?')[:5]
print queryset.query
The result is:
SELECT `blah_model`.`id`, `blah_model`.`date` FROM `blah_model` ORDER BY RAND() LIMIT 5;
So, they are the same.
I wouldn't be too surprise at the result, because django ORM is a direct mapping between sql query result and django object, so order_by('?')
would be equal to ORDER BY RAND()
, even the [:5]
statement is translated to LIMIT
in mysql (here's the doc and doc).
Upvotes: 1