AKC
AKC

Reputation: 1023

Empty String is not treated as null in Hive

My understanding of the following statement is that if blank or empty string is inserted into hive column, it will be treated as null.

TBLPROPERTIES('serialization.null.format'=''

To test the functionality i have created a table and insertted '' to the filed 3. When i query for nulls on the field3, there are no rows with that criteria.

Is my understanding of making blank string to null correct??

CREATE TABLE CDR
(
field1                 string,
field2                 string,
field3                 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
**TBLPROPERTIES('serialization.null.format'='');**

insert overwrite table emmtest.cdr select **field1,field2,''** from emmtest.cdr_non_orc;

select * from emmtest.cdr where **field3 is null;**

The last statement has not returned any rows. But i am expecting all rows to be returned since there is blank string in field3.

Upvotes: 6

Views: 53839

Answers (2)

John R. Martinez
John R. Martinez

Reputation: 67

You can use the following in your Hive Query properties:

NULL DEFINED AS ''

or any character inside the quotes.

Upvotes: 1

David דודו Markovitz
David דודו Markovitz

Reputation: 44991

TBLPROPERTIES('serialization.null.format'='') means the following:

  • An empty field in the data files will be treated as NULL when you query the table
  • When inserting rows to the table, NULL values will be written to the data files as empty fields

You are doing something else -
You are inserting an empty string to a table from a query.
It is treated "as is" - an empty string.

Demo

bash

hdfs dfs -mkdir /user/hive/warehouse/mytable
echo Hello,,World | hdfs dfs -put - /user/hive/warehouse/mytable/data.txt

hive

create table mytable (s1 string,s2 string,s3 string) 
row format delimited 
fields terminated by ','
;

hive> select * from mytable;
OK
s1  s2  s3
Hello       World

hive> alter table mytable set tblproperties ('serialization.null.format'='');
OK

hive> select * from mytable;
OK
s1  s2  s3
Hello   NULL    World

Upvotes: 10

Related Questions