Reputation: 233
I have bunch of parquet files with timestamp fields in different format. I would like to unify the formats while loading the data into redshift table. But below syntax is not working and it throws error TIMEFORMAT argument is not supported for PARQUET based COPY
.
Any work arounds ? I can think of only one solution which is unify all timestamp formats while creating parquet file.Since I have to load 2-3 years of parquet files , so this solution is time consuming. Could any one let me know if you have any other solution OR anything missing in below syntax
Syntax :
COPY redshift_table_name
FROM 's3://path/to/files'
IAM_ROLE '*****'
TIMEFORMAT AS 'YYYY-MM-DDTHH:MI:SS'
FORMAT AS PARQUET
Upvotes: 1
Views: 2477
Reputation: 11
When we use Parquet, the only TimeStamp format acceptable (until this day) is the unix_timestamp (Bingint of number of seconds ex : 1608572074000 and NOT 2020-12-21 17:34:34+00)
The schema / metadata, use TimestampType
Upvotes: 0
Reputation: 2761
If you can use the Pyarrow library, load the parquet tables and then write them back out in Parquet format using the use_deprecated_int96_timestamps
parameter. Redshift will correctly recognize those. I haven't had any luck getting it to properly recognize any other timestamp formats when loading Parquet.
Upvotes: 2