Reputation: 73
table1 data sample:
year month day utmsource
2017 03 26 NULL
2017 03 27 NULL
2017 03 27 facebook
2017 03 27 newsletter
2017 03 27 banner
2017 03 27 facebook
expected selection:
year month day utmsource
2017 03 27 NULL
2017 03 27 newsletter
2017 03 27 banner
My Hive queries:
-- result = 0, it did not include the NULL utmsource record
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource="facebook"
-- result = 1 the NULL utmsource record is included
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND (utmsource IS NULL OR NOT utmsource="facebook")
-- also returns 0, the NULL utmsource record is not included
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';
Questions:
Upvotes: 1
Views: 1844
Reputation: 1269503
What you want is a NULL
-safe equality (or inequality) operator. In ANSI SQL, there is an operator called is distinct from
. Hive appears to use the MySQL version, which is <=>
. So, you can do:
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';
This operator is described in the documentation.
I should also point out that you might find this a simpler formulation for the SELECT
:
SELECT (COUNT(*) - COUNT(utmsource)) as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';
Although, overall, this would appear to be the simplest:
SELECT COUNT(*)as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND utmsource IS NULL;
The comparison to 'Facebook'
is unnecessary.
Upvotes: 4