Chirag
Chirag

Reputation: 335

Convert Blank to NULL in Hive

I am trying to convert blank values in the source file to NULL in the hive table by setting the property 'serialization.null.format' = '' . The query I have written in hive is:

create table test(a int, b string) stored as parquet TBLPROPERTIES('serialization.null.format'='');

And then insert values into this through impala something like this:

insert overwrite table test values (1, ''), (2, 'b');

The result of this shows something like this:

| a | b |


| 1 |   |

| 2 | b |

Can someone help me out here as to why is the blank not getting converted to NULL ?

Upvotes: 3

Views: 13019

Answers (3)

Tony79
Tony79

Reputation: 21

This will do the trick : nullif(trim(b),'') Will give the b or NULL value when blank. So when selecting the statement you can do

select a,nullif(trim(b),'') from test;

FYR: nullif( value 1, value 2 ) Returns NULL if value 1 = value 2; otherwise returns value 1 (as of Hive 2.3.0). Shorthand for: CASE WHEN value 1 = value 2 then NULL else value 1

https://www.docs4dev.com/docs/en/apache-hive/3.1.1/reference/LanguageManual_UDF.html

Cheers!!

Upvotes: 0

Aaron Faltesek
Aaron Faltesek

Reputation: 349

You could try inserting into the table using a statement like this:

CASE    
when TRIM(a) = ''
THEN NULL
ELSE a
END,

Upvotes: 0

Umberto Griffo
Umberto Griffo

Reputation: 931

The problem is the Parquet SerDe. See the issue at https://issues.apache.org/jira/browse/HIVE-12362.

The description is as follows:

create table src (a string);
insert into table src values (NULL), (''), ('');

0: jdbc:hive2://localhost:10000/default> select * from src;
+-----------+--+
| src.a  |
+-----------+--+
| NULL      |
|                |
|                |
+-----------+--+

create table dest (a string) row format serde 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' stored as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';

alter table dest set SERDEPROPERTIES ('serialization.null.format' = '');
alter table dest set TBLPROPERTIES ('serialization.null.format' = '');
insert overwrite table dest select * from src;

0: jdbc:hive2://localhost:10000/default> select * from test11;
+-----------+--+
| test11.a  |
+-----------+--+
| NULL      |
|                |
|                |
+-----------+--+

Upvotes: 1

Related Questions