Chhaya Vishwakarma
Chhaya Vishwakarma

Reputation: 1437

Replacing NULLs in hive

I have a table in hive which has coluns with 'null' as data. I want to replace 'null' with 'N/A' tried using COALESCE(col_name,'N/A') but It's not working I used if and it worked

if(col_name='null','N/A',col_name)

I have million of values in the column , will this 'if' affect the performance? if yes what can be best possible way to handle this situation?

Kindly suggest

Upvotes: 0

Views: 3130

Answers (2)

djmdata
djmdata

Reputation: 108

I suspect your column is not a true NULL value (i.e., the absence of a value), but rather is the character string equal to "null". Perhaps this is how the data was loaded from a source system?

If that is true, then the COALESCE(col_name,'N/A') function would always return the value of col_name because that is the first non-NULL value listed in the function.

Try a query like:

SELECT COUNT(*) FROM table_name WHERE col_name IS NULL;

This will show you if the column actually has NULL values or not as you have Hive configured.

Note that you can alter the behavior of Hive by overriding the default NULL behavior with:

ALTER table_name SET TBLPROPERTIES('serialization.null.format'='your_value')

Also note if you have a partitioned table, the above SET statement must be used on each partition definition - the partitions will not use the table-level setting.

Upvotes: 1

gbharat
gbharat

Reputation: 276

Using IF condition won't affect your performance. Whatever conditional function you choose(CASE, NVL,IF etc) it need to iterate through all records to check that condition. Go ahead and use IF condition

Upvotes: 1

Related Questions