Reputation: 277
I'm getting a 1202 Extra column(s) found
error in Redshift when trying to load a simple CSV. I've made sure that there are no additional columns nor any unescaped characters in the file that would cause the COPY command to fail with this error.
Here's the created target table:
create table test_table(
name varchar(500),
email varchar(500),
developer_id integer,
developer_name varchar(500),
country varchar(20),
devdatabase varchar(50));
I'm using a simple CSV with no header and only 3 rows of data:
john smith,[email protected],123,johndev,US,comet
jane smith,[email protected],124,janedev,GB,titan
jack smith,[email protected],125,jackdev,US,comet
Unfortunately my COPY command fails with err_1202 "Extra column(s) found"
.
COPY test_table
FROM 's3://mybucket/test/test_contacts.csv'
WITH credentials AS 'aws_access_key_id=<awskey>;aws_secret_access_key=<mykey>'
CSV;
There are no additional columns in the file.
Upvotes: 8
Views: 20726
Reputation: 179
This mostly happens because you are using csv format which by default has ',' as delimiter. And in your data, there will be fields with values that contains ','. This causes the data to have extra columns when try to load to redshift. There are quite a few ways to fix this. It will be mostly easy once you have identified which which column has commas in their value. You can identify the columns by looking at the stl_load errors
SELECT starttime, err_reason,raw_line,err_code,query,session,tbl FROM stl_load_errors WHERE filename like 's3://mybucket/test/%' ORDER BY query DESC, starttime DESC
then fix the column where there are extra columns. let say in this example, 'name' column has extra commas. then lets clean that data
df = (df.withColumn('name', F.regexp_replace(F.col('name'), ',', ' '))
)
Store the new dataframe in s3 and then use the below copy command to load to redshift
COPY 'table_name'
FROM 's3 path'
IAM_ROLE 'iam role'
DELIMITER ','
ESCAPE
IGNOREHEADER 1
MAXERROR AS 5
COMPUPDATE FALSE
ACCEPTINVCHARS
ACCEPTANYDATE
FILLRECORD
EMPTYASNULL
BLANKSASNULL
NULL AS 'null';
END;
Upvotes: 1
Reputation: 4829
notice glue is not as robust as one might think, column order plays a major role, check your table order as well as the table input, make sure the order and data types are identical, also see AWS Glue Developer Guide for more info
in addition, make sure you disabled 'Job bookmark' in the 'Job details' tab, for any development or generic job this is a major source of headache and troubles
Upvotes: 0
Reputation: 157
For me, it turned out to be that I executed the scripts on the wrong database within the cluster.
Upvotes: -2
Reputation: 20291
Make sure the correct delimiter is specified in the copy statement (and the source files). I run into the same issue. After a couple of attempts with different delimiters (while unloading table to s3 files, then copying into another table from the s3 files), I was able to solve the issue by using the delimiter '\t'
. Here is the full example in my case:
copy <TABLE-NAME>
from 's3://<FILES/LOCATION>'
access_key_id '<INSERT>'
secret_access_key '<INSERT>'
delimiter '\t'
ignoreheader 1
maxerror 10;
Upvotes: 1
Reputation: 206
I was also facing the same issue while loading the data. i rectified using following codes :
copy yourtablename
from 'your S3 Locations'
credentials 'your AWS credentials'
delimiter ',' IGNOREHEADER 1
removequotes
emptyasnull
blanksasnull
maxerror 5;
Upvotes: 13
Reputation: 41
Try this:
COPY test_table
FROM 's3://mybucket/test/test_contacts.csv'
WITH credentials AS 'aws_access_key_id=<awskey>;aws_secret_access_key=<mykey>'
delimiter ','
ignoreheader as 1
emptyasnull
blanksasnull
removequotes
escape;
Upvotes: 4