ariel
ariel

Reputation: 16150

Which is faster: Sum(Case When) Or Group By/Count(*)?

I can write

Select 
Sum(Case When Resposta.Tecla = 1 Then 1 Else 0 End) Valor1,
Sum(Case When Resposta.Tecla = 2 Then 1 Else 0 End) Valor2,
Sum(Case When Resposta.Tecla = 3 Then 1 Else 0 End) Valor3,
Sum(Case When Resposta.Tecla = 4 Then 1 Else 0 End) Valor4,
Sum(Case When Resposta.Tecla = 5 Then 1 Else 0 End) Valor5
From Resposta

Or

Select 
    Count(*)
From Resposta Group By Tecla

I tried this over a large number of rows and it seems like taking the same time.

Anyone can confirm this?

Upvotes: 2

Views: 8025

Answers (5)

bozdoz
bozdoz

Reputation: 12860

To fairly compete with count(*), Your first SQL should probably be:

Select 
Sum(Case When Resposta.Tecla >= 1 AND Resposta.Tecla <=5 Then 1 Else 0 End) Valor
From Resposta

And to answer your question, I'm not noticing a difference at all in speed between SUM CASE WHEN and COUNT. I'm querying over 250,000 rows in POSTGRESQL.

Upvotes: 0

boblemar
boblemar

Reputation: 1153

I believe the Group By is better because there are no specific treatments. It can be optimized by the database engine. I think the results may depend on the database engine you use. Maybe the one you are using optimizes the first query anderstanding it is like a group by !

You can try the "explain / explain plan" command to see how the engine is computing your querys but with my Microsoft SQL Server 2008, I just can see a swap between 2 operations ("Compute scalar" and "agregate").

I tried such queries on a database table :

  • SQL Server 2k8
  • 163000 rows in the table
  • 12 cathegories (Valor1 -> Valor12)

the results are quite differents :

  • Group By : 2seconds
  • Case When : 6seconds !

So My choice is "Group By". Another benefit is the query is simplyer to write !

Upvotes: 4

nonsleepr
nonsleepr

Reputation: 811

In my opinion GROUP BY statement will always be faster than SUM(CASE WHEN ...) because in your example for SUM ... there would be 5 different calculations while when using GROUP BY, DB will simply sort and calculate.

Imagine, you have a bag with different coins and you need to know, how much of earch type of coins do you have. You can do it this ways:

  • The SUM(CASE WHEN ...) way would be to compare each coin with predefined sample coins and do the math for each sample (add 1 or 0);
  • The GROUP BY way would be to sort coins by their types and then count earch group.

Which method would you prefer?

Upvotes: 2

Nikoloff
Nikoloff

Reputation: 4160

What the DB does internally with the second query is practically the same as what you explicitly tell it to do with the first. There should be no difference in the execution plan and thus in the time the query takes. Taking this into account, clearly using the second query is better:

  • it's much more flexible, when there are more values of Tecla you don't need to change your query
  • it's easier to understand. If you have a lot of values for Tecla it'll be harder to read the first query and realize it just counts distinct values
  • it's smaller - you're sending less information to the DB server and it will probably parse the query faster, which is the only performance difference I see in this queries. This makes a difference, albeit small

Upvotes: 3

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239724

Either one is going to have to read all rows from Resposta, so for any reasonably sized table, I'd expect the I/O cost to dominate - giving approximately the same overall runtime.

I'd generally use:

Select
    Tecla,
    Count(*)
From Resposta
Group By Tecla

If there's a reasonable chance that the range of Tecla values will change in the future.

Upvotes: 2

Related Questions