Ravi
Ravi

Reputation: 3303

Handling NULL values in Hive

I am trying to create a table (table 2) in Hive from another table (table 1). I am trying to exclude certain rows with NULL values and tried the following condition.

    insert overwrite table table2 partition (date = '2013-06-01')
    select column1, column 2....
    from table1
    where column1 is not NULL or column1 <> '';

However, when I try this following query with the new table I get 300+ rows with NULL vaues:

    select count(*) from table2 where column1 is NULL;

Could someone point to what is causing these NULL values?

Thank you.

Ravi

Upvotes: 18

Views: 209035

Answers (6)

camash
camash

Reputation: 3

I use below sql to exclude the null string and empty string lines.

select * from table where length(nvl(column1,0))>0

Because, the length of empty string is 0.

select length('');
+-----------+--+
| length()  |
+-----------+--+
| 0         |
+-----------+--+

Upvotes: 0

ShikharDua
ShikharDua

Reputation: 10039

Try to include length > 0 as well.

column1 is not NULL AND column1 <> '' AND length(column1) > 0 

Upvotes: 8

LxsScarredCrest
LxsScarredCrest

Reputation: 45

Try using isnull(a), isnotnull(a), nvl(), etc. On some versions(potentially in conjunction with the server settings - atleast with the one I am working on) of hive the 'IS NULL' and 'IS NOT NULL' syntax does not execute the logic when it is compiled. Check here for more information.

Upvotes: 1

Indrajeet Gour
Indrajeet Gour

Reputation: 4510

To check for the NULL data for column1 and consider your datatype of it is String, you could use below command :

select * from tbl_name where column1 is null or column1 <> '';

Upvotes: 0

Jickson T George
Jickson T George

Reputation: 318

What is the datatype for column1 in your Hive table? Please note that if your column is STRING it won't be having a NULL value even though your external file does not have any data for that column.

Upvotes: 2

ruakh
ruakh

Reputation: 183564

Firstly — I don't think column1 is not NULL or column1 <> '' makes very much sense. Maybe you meant to write column1 is not NULL and column1 <> '' (AND instead of OR)?

Secondly — because of Hive's "schema on read" approach to table definitions, invalid values will be converted to NULL when you read from them. So, for example, if table1.column1 is of type STRING and table2.column1 is of type INT, then I don't think that table1.column1 IS NOT NULL is enough to guarantee that table2.column1 IS NOT NULL. (I'm not sure about this, though.)

Upvotes: 31

Related Questions