Reputation: 1437
I have a table in hive which has coluns with 'null' as data. I want to replace 'null' with 'N/A' tried using COALESCE(col_name,'N/A') but It's not working I used if and it worked
if(col_name='null','N/A',col_name)
I have million of values in the column , will this 'if' affect the performance? if yes what can be best possible way to handle this situation?
Kindly suggest
Upvotes: 0
Views: 3130
Reputation: 108
I suspect your column is not a true NULL value (i.e., the absence of a value), but rather is the character string equal to "null". Perhaps this is how the data was loaded from a source system?
If that is true, then the COALESCE(col_name,'N/A')
function would always return the value of col_name because that is the first non-NULL value listed in the function.
Try a query like:
SELECT COUNT(*) FROM table_name WHERE col_name IS NULL;
This will show you if the column actually has NULL values or not as you have Hive configured.
Note that you can alter the behavior of Hive by overriding the default NULL behavior with:
ALTER table_name SET TBLPROPERTIES('serialization.null.format'='your_value')
Also note if you have a partitioned table, the above SET statement must be used on each partition definition - the partitions will not use the table-level setting.
Upvotes: 1
Reputation: 276
Using IF condition won't affect your performance. Whatever conditional function you choose(CASE, NVL,IF etc) it need to iterate through all records to check that condition. Go ahead and use IF condition
Upvotes: 1