Srinivas Anant
Srinivas Anant

Reputation: 19

I am trying to set the empty values in a csv file to zero in hive. But this code doesn't seem to work. What changes should I make?

This is the input .csv file

"1","","Animation"

"2","Jumanji",""

"","Grumpier Old Men","Comedy"

Hive Code

CREATE TABLE IF NOT EXISTS movies(movie_id int, movie_name string,genre string)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

WITH SERDEPROPERTIES (

"separatorChar" = ",",

"quoteChar" = "\"",

"serialization.null.format" = '0'

);

Output

1 Animation

2 Jumanji

    Grumpier Old Men        Comedy

Upvotes: 1

Views: 2248

Answers (1)

leftjoin
leftjoin

Reputation: 38290

Empty strings in csv are interpreted as empty strings, not NULLs. To represent NULL inside a delimited text file you should use "\N". Also Hive provides you a table property “serialization.null.format” which can be used to treat a character of your choice as null in Hive SQL. In your case it should be empty string "". To convert NULLs to zeroes use NVL(col, 0) or COALESCE(col, 0) function depending on your hive version (COALESCE should work for all).

Upvotes: 3

Related Questions