Reputation: 493
When I do a select statement for non-null values on a hive table, there are no correct results in the response. The result is as if "is not null" expression is not there!
Example:
select count(*)
from test_table
where test_col_id1='12345' and test_col_id2 is not null;
Note test_col_id1
and test_col_id2
are not partition keys.
Here's my hive version.
Hive 0.14.0.2.2.0.0-2041
Here's the table:
... | test_col_id1 | test_col_id2 |
... | 12345 | x |
... | 12345 | NULL |
This query returns 2 records.
Upvotes: 9
Views: 68634
Reputation: 914
Hive parse NULL value in a different way.
Hive does not consider empty value is NULL. However, by default, Hive uses \N
to represent
NULL (not a special character, just backslash plus capital letter N).
If you just want to filter out all the empty values, you can use
where test_col_id2 != ''
If you use the real NULL value in your original text file which was loaded into hive table, you can try
where test_col_id2 != '\000'
since the ASCII Code of NULL in octonary is 00 and the first "0" means it's an octonary number.
By the way, you can also use command:
alter table $YOUR_TABLE SETSERDEPROPERTIES('serialization.null.format' ='abc')
to customize the NULL value to "abc" if you want Hive to parse "abc" as the NULL value for some table.
Upvotes: 4
Reputation: 2177
Try the following query, does it return rows?
select count(*)
from test_table
where test_col_id1='12345' and test_col_id2 != 'NULL';
Then your NULL
is not NULL
, it's the string 'NULL'. Loads of people have problems with Hive treatment of NULL
strings. By default, it's the blank string ''
. If we want anything else, we have to specify exactly the way NULL strings should be treated when we create the table. Here are 3 examples of how to change what is recognized as NULL
. The first one sets 'NULL' strings as NULL
:
CREATE TABLE nulltest1 (id STRING, another_string STRING)
TBLPROPERTIES('serialization.null.format'='NULL') --sets the string 'NULL' as NULL;
CREATE TABLE nulltest2 (id STRING, another_string STRING)
TBLPROPERTIES('serialization.null.format'='') --sets empty string as NULL;
CREATE TABLE nulltest3 (id STRING, another_string STRING)
TBLPROPERTIES('serialization.null.format'='\N'); --sets \N as NULL;
Since you've already created your table, you can alter your table so that it will recognize your 'NULL'
as NULL
:
ALTER TABLE test_table SET TBLPROPERTIES ('serialization.null.format' = 'NULL');
Upvotes: 12