Reputation: 57
I have a parquet file in AWS S3 and I am trying to copy its data into a Redshift table. I created this table by crawling the parquet file in AWS Glue to generate the table DDL
COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
NULL AS 'NULL'
EMPTYASNULL
delimiter ','
region 'region_name'
IGNOREHEADER 1;
I was getting the following error: ERROR:Invalid digit, Value '.', Pos 0, Type: Double
Then I changed the data type from double to numeric and now I am getting this error: ERROR:Invalid exponent, Value 'e', Pos 187, Type: Decimal
The table has text, numeric and bigint column data types, but I do not know what column is causing this error. I don't understand the meaning of this error message. I would appreciate some guidance.
Upvotes: 0
Views: 1028
Reputation: 57
When copying data from a parquet file into a table you need to specify the format.
FORMAT AS PARQUET
:
COPY table_name FROM 's3://bucket/folder/file_name.parquet'
credentials 'aws_iam_role=...'
FORMAT AS PARQUET;
Parquet files store data in a different way as .csv files do.
This is a field value I get when I use pandas pd.read_parquet('path/file.parquet', engine='fastparquet')
to read the parquet file: 439.0
And this is the same value I was trying to insert into the table: ?..7.????.17?.?..v???A????
I was obiously getting many errors when trying to insert this into a column with datatype double precision.
Upvotes: 1
Reputation: 11092
I suspect that the data in the parquet file is stored in an exponent format. For example the number 123 can be represented as 1.23e2 (1.32 X 10^2). I expect Redshift is not understanding this format.
If I am right you can COPY the file into Redshift with this column as a varchar and then cast it to the desired data type.
Upvotes: 0