Achyut Muley
Achyut Muley

Reputation: 43

How to include escape character as part of column value in a csv file while using apache drill?

I have a csv file like this-

"id"^"first_name"^"last_name"^"email"^"gender"
"1"^"John"^"143 \\"^"[email protected]"^"Male"
"2"^"Willaim"^"Khan"^"[email protected]"^"Male"

If i execute any drill query on this then i get the following error- UserRemoteException : DATA_READ ERROR: Unexpected character '101' following quoted value of CSV field. Expecting '94'. Cannot parse CSV input."

But with a csv like this-

^id^|^first_name^|^last_name^|^email^|^gender^
^1^|^John^|^Bharadwaj \\^|^[email protected]^|^Male^
^2^|^Willaim^|^Khan^|^[email protected]^|^Male^

Everything works fine.

This is my dfs configuration for csv in apache drill.I am using the version 1.21.1-

"csv": { "type": "text", "extensions": [ "csv" ], "lineDelimiter": "\n", "fieldDelimiter": "^", "quote": """, "escape": "\", "comment": "#", "extractHeader": true }

It seems there is some issue when escape is present just immediately before quote. I tried changing the escape value to ~ and observed the same issue. Any insights?

Upvotes: 0

Views: 56

Answers (1)

Dzamo Norton
Dzamo Norton

Reputation: 1389

It looks the Drill CSV parser will only escape quote characters using the configured escape character and, in particular, the escape character cannot escape an occurrence of itself. I'm not sure why this limitation exists.

Upvotes: 0

Related Questions