Reputation: 2250
I am trying to copy some data from S3 bucket to redshift table by using the COPY command. The format of the file is PARQUET. When I run the execute the COPY command query, I get InternalError_: Spectrum Scan Error.
This is the first time I tried copying from a parquet file.
Please help me if there is a solution for this. I am using boto3 in python.
Upvotes: 10
Views: 36056
Reputation: 1208
This generally happens for below reasons:
Try going into the error logs. You might find partial log in cloud watch. From the screen shot you have uplaoded, you can also find a query number you have run.
Got to aws redshift query editor and run below query to get the full log:
select message
from svl_s3log
where query = '<<your query number>>'
order by query,segment,slice;
Hope this helps !
Upvotes: 29
Reputation: 15
Spectrum scan error are usually caused by two things.
a) column mismatch between source and destination e.g. if u are copying data from S3 to redshift then, the columns of parquet are not in order with those present in redshift tables.
b) there is match in the datatype for source and destination e.g. S3 to redshift copy, in parquet one has col1 datatype as Integer and in redshift same col1 has datatype as float.
Verify the schema with their datatype matching the sequence and the datatype for source and destination will solve the Spectrum Scan Error.
Upvotes: 1
Reputation: 31
This error usually indicates some problem with compatibility of data in your file and redshift tables. you can get more insights about error in table 'SVL_S3LOG'. In my case it was because file had some invalid utf8 characters.
Upvotes: 3