Reputation: 37
SELECT state,
COUNT(CASE WHEN elevation >= 2000 THEN 1 ELSE NULL END) as count_high_elevation_aiports
FROM airports
GROUP BY state;
In the above statement, what is the THEN 1 and what does '1' signify? How does that value '1' after THEN affect the output?
Upvotes: 4
Views: 49
Reputation: 179244
First, note that these three expressions are equivent:
CASE WHEN elevation >= 2000 THEN 1 ELSE NULL END
IF(elevation >= 2000, 1, NULL)
((elevation >= 2000) OR NULL)
If elevation >= 2000, the expression evaluates as "1", otherwise the expression evaluates as NULL
.
"1" is conventionally used as boolean true, and you could substitute the MySQL literal TRUE
in the above expressions with equivalent results... but that isn't what the "1" is for, here.
When used with COUNT()
, in cases like this, the only real significance of 1
is that it is not NULL
.
This is important, because -- contrary to popular belief -- COUNT()
does not count rows. It counts values.
What's the difference? NULL
is not technically a value. Instead, it is a marker that signifies the absence of a value, thus COUNT(expr)
only counts rows where expr is not null.
By using an expression like the one here, you're asking the server to count the rows with elevation => 2000, and you do this by giving COUNT()
a NULL
for rows you want not to be counted... and a non-null value for rows you do.
Aggregate (GROUP BY
) functions operate on values -- and NULL
, again, is not a value in this sense.
Another aggregate function that makes this rationale perhaps even more clear is AVG()
. If you had 3 rows... with values 5, NULL
, and 10... what's the average? If you said 7.5, that's correct: the average of these 3 rows is (5 + 10) ÷ 2 = 5 because the 3 rows have only two values. NULL
is not 0, otherwise the average would be (5 + 0 + 10) ÷ 3 = 5, which it is not.
So, that's how and why this works.
How does that value '1' after
THEN
affect the output?
It really doesn't. You could just as easily have said COUNT(CASE WHEN elevation >= 2000 THEN 'cat videos are funny' ELSE NULL END)
because, just like the literal 1
, the literal string 'cat videos are funny'
is also not null, and non-null values -- anything not null -- is what count will count.
A novice might try to accomplish this task with COUNT(elevation >= 2000)
, but that gives the wrong answer, because the 0 (false) for rows where elevation is < 2000 is not null, so these rows would still be counted.
You may then ask, "why not just use COUNT(*) ... WHERE elevation >= 2000
?" Good question. The reasons vary, but if you GROUP BY state
and there are states with no rows matching WHERE
, those states would be entirely eliminated from the results, which is often not what you want. This query includes them, with a count of zero.
Note that ((elevation >= 2000) OR NULL)
, the third example expression at the top, doesn't actually need the parentheses. I included them because this form is not necessarilly intuitive at first glance. The natural precedence of operations will cause this to be evaluated correctly if written simply elevation >= 2000 OR NULL
. This expression is equivalent to the other two because elevation >= 2000
first evaluates to 1 if true, 0 if false, or NULL
if elevation is null. Then the lower-precedence OR
is evaluated, and you get one of these: 1 OR NULL => 1
... 0 OR NULL => NULL
... NULL OR NULL => NULL
... and you may actually be awarded a SQL wizard badge by the elders of the Internet at the point when writing queries with COUNT(elevation >= 2000 OR NULL)
comes naturally to you.
Upvotes: 2
Reputation: 416
Query will simply return 1 if elevation
is >
or =
2000, else it will return NULL
(this is use full for boolean representation of field because NULL represents 0),
Now returned value will be set into count_high_elevation_airports
.
Upvotes: 1