Jgreen727
Jgreen727

Reputation: 75

OpenCSVSerde escapeChar overriding quoteChar

I have a number of csv’s I’m importing into Hive and I’ve found that my escapeChar of a new line is being triggered even when it is within a quoted field, which is my quoteChar. Is there any straightforward way around this dilemma?

Line1field1 text,Line1field2 text,”Line1field3 text \n with new line”\n
Line2field1 text,”Line2field2 text, with comma”

Upvotes: 2

Views: 521

Answers (1)

leftjoin
leftjoin

Reputation: 38290

No way to fix it with text format in Hive.

OpenCSVSerDe does not handele embedded newlines, see this documentation

Text formats like CSV, JSON, do not allow embedded newlines and SerDe's which work with text formats such as RegexSerDe, OpenCSVSerDe, JSONSerDe, LasySimpleSerDe do not handle embedded newlines.

You can store embedded newlines in binary formats: ORC, Parquet, Avro, but in different query tools newlines will cause line breaks and shift, though you can replace newlines with something if it is stored in binary format in the query. For text formats it is not possible, because record reader reads lines and serde receives separate lines.

The solution is to transform CSV before loading into hive and replace newlines with something else, or use binary format if possible.

Upvotes: 2

Related Questions