pramav
pramav

Reputation: 493

handling newline character in hive

I have created a table in hive as

Create table(id int, Description String)  

My data looks something as follows :

 
1|This will return corrupt data since there is a ',' in the first string.
     some text
     Change the data  
2|There is prob in reading data 
    sometext

After the data is loaded into hive since the default line terminator is \n, the description column cannot be read by hive, Hence it displays a NULL value. Can anyone suggest how to handle newline before loading into hive.

Upvotes: 3

Views: 12027

Answers (3)

prasad
prasad

Reputation: 359

try adding the below property in hive-site.xml or you can just try for temporary hive session level.

hive.query.result.fileformat=SequenceFile

Upvotes: 1

Brian Schrameck
Brian Schrameck

Reputation: 608

I know this question is old, but you have a couple of options. You can't control this with FIELDS TERMINATED BY, because that only controls what terminates the fields, not the records. Records in Hive are hard-coded to be terminated by the newline character (even though there is a LINES TERMINATED BY clause, it is not implemented).

  1. Write a custom InputFormat that uses a RecordReader that understands non-newline delimited records. Look at the code for LineReader/LineRecordReader and TextInputFormat.
  2. Use a format other than text/ASCII, like Parquet. I would recommend this regardless, as text is probably the worst format you can store data in anyway.

Upvotes: 3

K S Nidhin
K S Nidhin

Reputation: 2650

By default hive takes in NEWLINE ('\N') as delimiter . You can change the delimiter using:

    ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

Upvotes: -3

Related Questions