Rohan Nayak
Rohan Nayak

Reputation: 233

How to format timestamp field while loading parquet file to redshift?

I have bunch of parquet files with timestamp fields in different format. I would like to unify the formats while loading the data into redshift table. But below syntax is not working and it throws error TIMEFORMAT argument is not supported for PARQUET based COPY.

Any work arounds ? I can think of only one solution which is unify all timestamp formats while creating parquet file.Since I have to load 2-3 years of parquet files , so this solution is time consuming. Could any one let me know if you have any other solution OR anything missing in below syntax

Syntax :

COPY redshift_table_name
FROM 's3://path/to/files'
IAM_ROLE '*****'
TIMEFORMAT AS 'YYYY-MM-DDTHH:MI:SS'
FORMAT AS PARQUET

Upvotes: 1

Views: 2477

Answers (2)

fatimazahramoussaid
fatimazahramoussaid

Reputation: 11

When we use Parquet, the only TimeStamp format acceptable (until this day) is the unix_timestamp (Bingint of number of seconds ex : 1608572074000 and NOT 2020-12-21 17:34:34+00)

The schema / metadata, use TimestampType

Upvotes: 0

DeeDee
DeeDee

Reputation: 2761

If you can use the Pyarrow library, load the parquet tables and then write them back out in Parquet format using the use_deprecated_int96_timestamps parameter. Redshift will correctly recognize those. I haven't had any luck getting it to properly recognize any other timestamp formats when loading Parquet.

Upvotes: 2

Related Questions