Reputation: 3303
I am trying to create a table (table 2) in Hive from another table (table 1). I am trying to exclude certain rows with NULL values and tried the following condition.
insert overwrite table table2 partition (date = '2013-06-01')
select column1, column 2....
from table1
where column1 is not NULL or column1 <> '';
However, when I try this following query with the new table I get 300+ rows with NULL vaues:
select count(*) from table2 where column1 is NULL;
Could someone point to what is causing these NULL values?
Thank you.
Ravi
Upvotes: 18
Views: 209035
Reputation: 3
I use below sql to exclude the null string and empty string lines.
select * from table where length(nvl(column1,0))>0
Because, the length of empty string is 0.
select length('');
+-----------+--+
| length() |
+-----------+--+
| 0 |
+-----------+--+
Upvotes: 0
Reputation: 10039
Try to include length > 0 as well.
column1 is not NULL AND column1 <> '' AND length(column1) > 0
Upvotes: 8
Reputation: 45
Try using isnull(a), isnotnull(a), nvl(), etc. On some versions(potentially in conjunction with the server settings - atleast with the one I am working on) of hive the 'IS NULL' and 'IS NOT NULL' syntax does not execute the logic when it is compiled. Check here for more information.
Upvotes: 1
Reputation: 4510
To check for the NULL data for column1 and consider your datatype of it is String, you could use below command :
select * from tbl_name where column1 is null or column1 <> '';
Upvotes: 0
Reputation: 318
What is the datatype for column1 in your Hive table? Please note that if your column is STRING it won't be having a NULL value even though your external file does not have any data for that column.
Upvotes: 2
Reputation: 183564
Firstly — I don't think column1 is not NULL or column1 <> ''
makes very much sense. Maybe you meant to write column1 is not NULL and column1 <> ''
(AND
instead of OR
)?
Secondly — because of Hive's "schema on read" approach to table definitions, invalid values will be converted to NULL
when you read from them. So, for example, if table1.column1
is of type STRING
and table2.column1
is of type INT
, then I don't think that table1.column1 IS NOT NULL
is enough to guarantee that table2.column1 IS NOT NULL
. (I'm not sure about this, though.)
Upvotes: 31