ErezO
ErezO

Reputation: 364

django query optimization - when to use sql and when to use python

I'm looking for a general rule of thumb on when it's faster to re-query the database, and when it's faster to use python and extract data from the cache.

Let's assume I need to extract two things simultaniously from the database: all pizzas, and a specific pizza with pk=5.

What's more optimized:

pizzas = Pizza.objects.all()   
specific_pizza = Piazza.objects.get(pk=5)

OR

pizzas = Pizza.objects.all()
for pizza in pizzas:
    if pizza.pk == 5
        specific_pizza = pizza
        break 

Of course it depends on the database. For example, if pizzas are 10 million rows, it's obvious that re-querying sql is better, and if pizzas are 10 rows, even if the field is indexed, python is probably faster.

Can anyone help what's more optimized in the middle range? For example, pizzas is hundreds of rows? thousands of rows?

Upvotes: 0

Views: 584

Answers (3)

ErezO
ErezO

Reputation: 364

I appreciate @ch3ka and @goncalopp responses, but I didn't think they directly answered the question, so here's my shot of some profiling myself:

Bottom line, I found the point where python lookup is even to that of sql , to be around 1000 entries:

Assuming I've already queried the database and received 1000 pizzas:

pizzas = Pizza.objects.all()

I did two tests:

Test1: Find a specific pizza in 1000 pizzas using by looking at pk's:

for pizza in pizzas:
    if pizza.pk == 500
        specific_pizza = pizza
        break 

Took 0.2 miliseconds

Test2: Filter according to a member of pizza, and create a new list:

mushroom_pizzas=[pizza for pizza in pizzas if pizza.topping==Pizza.MUSHROOM]

where MUSHROOM is an enum of a possible topping. I chose enum, because I think it's a correct comparison to an indexed DB field

Took 0.3 miliseconds

Using Django debug toolbar, the time it takes for a simple indexed sql query is around 0.3 miliseconds.

I do think like @goncalopp and @ch3ka that since simple indexed queries are already 0.3 miliseconds, there's really no point going to python for optimization. So even if I know in advance the number of entries will be less than 1000, and even far less than 1000, I would still always use sql.

I'd appreciate any comments if I miscalculated or reached a wrong conclusion.

Upvotes: 0

ch3ka
ch3ka

Reputation: 12158

for example, if pizzas are 10 million rows, it's obvious that re-querying sql is better, and if pizzas are 10 rows, even if the field is indexed, python is probably faster.

Well... first statement: yes. Second statement: not sure, but also not important. Because when there are only few pizzas, neighter command will take a noticeable time.

Can anyone help what's more optimized in the middle range?

Not like you expected, I guess, but yes: since we agree that using .get() will be faster when there are many pizzas, and since we see that performance is only a concern when there are many pizzas, considering the fact that the number of pizzas may grow in the future, I think we can agree that using .get() is the right thing to do.

Performance aside - it's also clearly more readable, so you really should go that route.

Also, note that you can use methods on a QuerySet (.all() returns a QuerySet!) to filter what you want. How this works is "magic behind the scenes" - and as such assumed to be optimized until evidence is found against that assumption. So you should use those methods, until you hit a point where targeted optimization is really needed. And if you ever hit that point, you can benchmark away and have a reliable answer.

Upvotes: 1

loopbackbee
loopbackbee

Reputation: 23322

There's no definitive answer to this question - as you said, it depends on the database (and probably also its location, the number and size of the tables, ...). You'll have to test in your particular environment.

Besides raw speed, there's some important advantages to using the first version:

  • It's shorter and clearer
  • The ORM knows exactly what you want, so any further optimizations can be done at that level, instead of pushing them to your application
  • It avoids doing (potentially) intensive computation in your web server

Also, some food for thought: if your tables are small enough that python is faster than the DB, does speed matter?

You may want to read on premature optimization

Upvotes: 2

Related Questions