Reputation: 364
I'm looking for a general rule of thumb on when it's faster to re-query the database, and when it's faster to use python and extract data from the cache.
Let's assume I need to extract two things simultaniously from the database: all pizzas, and a specific pizza with pk=5.
What's more optimized:
pizzas = Pizza.objects.all()
specific_pizza = Piazza.objects.get(pk=5)
OR
pizzas = Pizza.objects.all()
for pizza in pizzas:
if pizza.pk == 5
specific_pizza = pizza
break
Of course it depends on the database. For example, if pizzas are 10 million rows, it's obvious that re-querying sql is better, and if pizzas are 10 rows, even if the field is indexed, python is probably faster.
Can anyone help what's more optimized in the middle range? For example, pizzas is hundreds of rows? thousands of rows?
Upvotes: 0
Views: 584
Reputation: 364
I appreciate @ch3ka and @goncalopp responses, but I didn't think they directly answered the question, so here's my shot of some profiling myself:
Assuming I've already queried the database and received 1000 pizzas:
pizzas = Pizza.objects.all()
I did two tests:
Test1: Find a specific pizza in 1000 pizzas using by looking at pk's:
for pizza in pizzas:
if pizza.pk == 500
specific_pizza = pizza
break
Took 0.2 miliseconds
Test2: Filter according to a member of pizza, and create a new list:
mushroom_pizzas=[pizza for pizza in pizzas if pizza.topping==Pizza.MUSHROOM]
where MUSHROOM is an enum of a possible topping. I chose enum, because I think it's a correct comparison to an indexed DB field
Took 0.3 miliseconds
Using Django debug toolbar, the time it takes for a simple indexed sql query is around 0.3 miliseconds.
I'd appreciate any comments if I miscalculated or reached a wrong conclusion.
Upvotes: 0
Reputation: 12158
for example, if pizzas are 10 million rows, it's obvious that re-querying sql is better, and if pizzas are 10 rows, even if the field is indexed, python is probably faster.
Well... first statement: yes. Second statement: not sure, but also not important. Because when there are only few pizzas, neighter command will take a noticeable time.
Can anyone help what's more optimized in the middle range?
Not like you expected, I guess, but yes: since we agree that using .get()
will be faster when there are many pizzas, and since we see that performance is only a concern when there are many pizzas, considering the fact that the number of pizzas may grow in the future, I think we can agree that using .get()
is the right thing to do.
Performance aside - it's also clearly more readable, so you really should go that route.
Also, note that you can use methods on a QuerySet
(.all()
returns a QuerySet
!) to filter what you want. How this works is "magic behind the scenes" - and as such assumed to be optimized until evidence is found against that assumption. So you should use those methods, until you hit a point where targeted optimization is really needed. And if you ever hit that point, you can benchmark away and have a reliable answer.
Upvotes: 1
Reputation: 23322
There's no definitive answer to this question - as you said, it depends on the database (and probably also its location, the number and size of the tables, ...). You'll have to test in your particular environment.
Besides raw speed, there's some important advantages to using the first version:
Also, some food for thought: if your tables are small enough that python is faster than the DB, does speed matter?
You may want to read on premature optimization
Upvotes: 2