indus73
indus73

Reputation: 1

Error while creating Parquet table from CSV using Apache Drill

I'm Trying to create a Parquet table from a CSV extract (generated from an Oracle database table) that has over a million rows. about 25 of those rows have null values for the START_DATE and CTAS is failing to interpret "" as null. Any suggestions would be greatly appreciated.

CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
to_timestamp(columns[3], 'dd-MMM-yy HH.mm.ss.SSSSSS a') as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;



Error: SYSTEM ERROR: IllegalArgumentException: Invalid format ""

Upvotes: 0

Views: 545

Answers (2)

Srihari Karanth
Srihari Karanth

Reputation: 2167

You can also use NULLIF() function as below

CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
to_timestamp(NULLIF(columns[3],''), 'dd-MMM-yy HH.mm.ss.SSSSSS a') as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;

NULLIF will convert empty string to null and the casting won't fail.

Upvotes: 0

Chris Matta
Chris Matta

Reputation: 3433

You can always include a CASE statement to filter out the empty entries:

CREATE TABLE dfs.tmp.FOO as
select cast(columns[0] as INT) as `PRODUCT_ID`,
cast(columns[1] as INT) as `LEG_ID`,
columns[2] as `LEG_TYPE`,
CASE WHEN columns[3] = '' THEN null
  ELSE to_timestamp(columns[3], 'dd-MMM-yy HH.mm.ss.SSSSSS a') 
END as `START_DATE`
from dfs.`c:\work\prod\data\foo.csv`;

Upvotes: 0

Related Questions