Uday Bhadang
Uday Bhadang

Reputation: 37

having trouble understanding CASE WHEN statement in SQL

SELECT    state, 
    COUNT(CASE WHEN elevation >= 2000 THEN 1 ELSE NULL END) as count_high_elevation_aiports 
FROM airports 
GROUP BY state;

In the above statement, what is the THEN 1 and what does '1' signify? How does that value '1' after THEN affect the output?

Upvotes: 4

Views: 49

Answers (2)

Michael - sqlbot
Michael - sqlbot

Reputation: 179244

First, note that these three expressions are equivent:

CASE WHEN elevation >= 2000 THEN 1 ELSE NULL END
IF(elevation >= 2000, 1, NULL)
((elevation >= 2000) OR NULL)

If elevation >= 2000, the expression evaluates as "1", otherwise the expression evaluates as NULL.

"1" is conventionally used as boolean true, and you could substitute the MySQL literal TRUE in the above expressions with equivalent results... but that isn't what the "1" is for, here.

When used with COUNT(), in cases like this, the only real significance of 1 is that it is not NULL.

This is important, because -- contrary to popular belief -- COUNT() does not count rows. It counts values.

What's the difference? NULL is not technically a value. Instead, it is a marker that signifies the absence of a value, thus COUNT(expr) only counts rows where expr is not null.

By using an expression like the one here, you're asking the server to count the rows with elevation => 2000, and you do this by giving COUNT() a NULL for rows you want not to be counted... and a non-null value for rows you do.

Aggregate (GROUP BY) functions operate on values -- and NULL, again, is not a value in this sense.

Another aggregate function that makes this rationale perhaps even more clear is AVG(). If you had 3 rows... with values 5, NULL, and 10... what's the average? If you said 7.5, that's correct: the average of these 3 rows is (5 + 10) ÷ 2 = 5 because the 3 rows have only two values. NULL is not 0, otherwise the average would be (5 + 0 + 10) ÷ 3 = 5, which it is not.

So, that's how and why this works.

How does that value '1' after THEN affect the output?

It really doesn't. You could just as easily have said COUNT(CASE WHEN elevation >= 2000 THEN 'cat videos are funny' ELSE NULL END) because, just like the literal 1, the literal string 'cat videos are funny' is also not null, and non-null values -- anything not null -- is what count will count.

A novice might try to accomplish this task with COUNT(elevation >= 2000), but that gives the wrong answer, because the 0 (false) for rows where elevation is < 2000 is not null, so these rows would still be counted.

You may then ask, "why not just use COUNT(*) ... WHERE elevation >= 2000?" Good question. The reasons vary, but if you GROUP BY state and there are states with no rows matching WHERE, those states would be entirely eliminated from the results, which is often not what you want. This query includes them, with a count of zero.


Note that ((elevation >= 2000) OR NULL), the third example expression at the top, doesn't actually need the parentheses. I included them because this form is not necessarilly intuitive at first glance. The natural precedence of operations will cause this to be evaluated correctly if written simply elevation >= 2000 OR NULL. This expression is equivalent to the other two because elevation >= 2000 first evaluates to 1 if true, 0 if false, or NULL if elevation is null. Then the lower-precedence OR is evaluated, and you get one of these: 1 OR NULL => 1 ... 0 OR NULL => NULL ... NULL OR NULL => NULL... and you may actually be awarded a SQL wizard badge by the elders of the Internet at the point when writing queries with COUNT(elevation >= 2000 OR NULL) comes naturally to you.

Upvotes: 2

rathodbhavikk
rathodbhavikk

Reputation: 416

Query will simply return 1 if elevation is > or = 2000, else it will return NULL (this is use full for boolean representation of field because NULL represents 0), Now returned value will be set into count_high_elevation_airports.

Upvotes: 1

Related Questions