Naveen Karnam
Naveen Karnam

Reputation: 493

Expression "IS NOT NULL" not working on HQL

When I do a select statement for non-null values on a hive table, there are no correct results in the response. The result is as if "is not null" expression is not there!

Example:

select count(*)
from test_table
where test_col_id1='12345' and test_col_id2 is not null;

Note test_col_id1 and test_col_id2 are not partition keys.

Here's my hive version.

Hive 0.14.0.2.2.0.0-2041

Here's the table:

... | test_col_id1 | test_col_id2 |
... | 12345 | x |
... | 12345 | NULL |

This query returns 2 records.

Upvotes: 9

Views: 68634

Answers (2)

Jing He
Jing He

Reputation: 914

Hive parse NULL value in a different way.

Hive does not consider empty value is NULL. However, by default, Hive uses \N to represent NULL (not a special character, just backslash plus capital letter N). If you just want to filter out all the empty values, you can use

where test_col_id2 != ''

If you use the real NULL value in your original text file which was loaded into hive table, you can try

where test_col_id2 != '\000'

since the ASCII Code of NULL in octonary is 00 and the first "0" means it's an octonary number.

By the way, you can also use command:

alter table $YOUR_TABLE SETSERDEPROPERTIES('serialization.null.format' ='abc')

to customize the NULL value to "abc" if you want Hive to parse "abc" as the NULL value for some table.

Upvotes: 4

Edward R. Mazurek
Edward R. Mazurek

Reputation: 2177

Try the following query, does it return rows?

select count(*)
from test_table
where test_col_id1='12345' and test_col_id2 != 'NULL';

Then your NULL is not NULL, it's the string 'NULL'. Loads of people have problems with Hive treatment of NULL strings. By default, it's the blank string ''. If we want anything else, we have to specify exactly the way NULL strings should be treated when we create the table. Here are 3 examples of how to change what is recognized as NULL. The first one sets 'NULL' strings as NULL:

CREATE TABLE nulltest1 (id STRING, another_string STRING)
TBLPROPERTIES('serialization.null.format'='NULL') --sets the string 'NULL' as NULL;
CREATE TABLE nulltest2 (id STRING, another_string STRING)
TBLPROPERTIES('serialization.null.format'='') --sets empty string as NULL;
CREATE TABLE nulltest3 (id STRING, another_string STRING)
TBLPROPERTIES('serialization.null.format'='\N'); --sets \N as NULL;

Since you've already created your table, you can alter your table so that it will recognize your 'NULL' as NULL:

ALTER TABLE test_table SET TBLPROPERTIES ('serialization.null.format' = 'NULL');

Upvotes: 12

Related Questions