csvamazon-web-servicesamazon-s3amazon-redshift

Reputation: 277

Redshift Error 1202 "Extra column(s) found" using COPY command

I'm getting a 1202 Extra column(s) found error in Redshift when trying to load a simple CSV. I've made sure that there are no additional columns nor any unescaped characters in the file that would cause the COPY command to fail with this error.

Here's the created target table:

create table test_table(
  name varchar(500),
  email varchar(500),
  developer_id integer,
  developer_name varchar(500),
  country varchar(20),
  devdatabase varchar(50));

I'm using a simple CSV with no header and only 3 rows of data:

john smith,[email protected],123,johndev,US,comet
jane smith,[email protected],124,janedev,GB,titan
jack smith,[email protected],125,jackdev,US,comet

Unfortunately my COPY command fails with err_1202 "Extra column(s) found".

COPY test_table 
FROM 's3://mybucket/test/test_contacts.csv'    
WITH credentials AS 'aws_access_key_id=<awskey>;aws_secret_access_key=<mykey>'
CSV;

There are no additional columns in the file.

Upvotes: 8

Answers (6)

vinu

Reputation: 179

This mostly happens because you are using csv format which by default has ',' as delimiter. And in your data, there will be fields with values that contains ','. This causes the data to have extra columns when try to load to redshift. There are quite a few ways to fix this. It will be mostly easy once you have identified which which column has commas in their value. You can identify the columns by looking at the stl_load errors

SELECT starttime, err_reason,raw_line,err_code,query,session,tbl FROM stl_load_errors WHERE filename like 's3://mybucket/test/%' ORDER BY query DESC, starttime DESC

then fix the column where there are extra columns. let say in this example, 'name' column has extra commas. then lets clean that data

df = (df.withColumn('name', F.regexp_replace(F.col('name'), ',', ' '))
        )

Store the new dataframe in s3 and then use the below copy command to load to redshift

    COPY 'table_name'
FROM 's3 path'
IAM_ROLE 'iam role'
DELIMITER ','
ESCAPE
IGNOREHEADER 1
MAXERROR AS 5
COMPUPDATE FALSE
ACCEPTINVCHARS
ACCEPTANYDATE
FILLRECORD
EMPTYASNULL
BLANKSASNULL
NULL AS 'null';
END;

Upvotes: 1

Yakir Manor

Reputation: 4839

notice glue is not as robust as one might think, column order plays a major role, check your table order as well as the table input, make sure the order and data types are identical, also see AWS Glue Developer Guide for more info

in addition, make sure you disabled 'Job bookmark' in the 'Job details' tab, for any development or generic job this is a major source of headache and troubles

Upvotes: 0

Danilo Popovikj

Reputation: 157

For me, it turned out to be that I executed the scripts on the wrong database within the cluster.

Upvotes: -2

Aziz Alto

Reputation: 20401

Make sure the correct delimiter is specified in the copy statement (and the source files). I run into the same issue. After a couple of attempts with different delimiters (while unloading table to s3 files, then copying into another table from the s3 files), I was able to solve the issue by using the delimiter '\t'. Here is the full example in my case:

copy <TABLE-NAME>
from 's3://<FILES/LOCATION>'
access_key_id '<INSERT>'
secret_access_key '<INSERT>'
delimiter '\t'
ignoreheader 1
maxerror 10;

Upvotes: 1

user5068547

Reputation: 206

I was also facing the same issue while loading the data. i rectified using following codes :

copy yourtablename
from 'your S3 Locations'
credentials 'your AWS credentials' 
delimiter ',' IGNOREHEADER 1 
removequotes
emptyasnull
blanksasnull
maxerror 5;

Upvotes: 13

Vrishali Shah

Reputation: 41

Try this:

COPY test_table 
FROM 's3://mybucket/test/test_contacts.csv'    
WITH credentials AS 'aws_access_key_id=<awskey>;aws_secret_access_key=<mykey>'
delimiter ',' 
ignoreheader as 1 
emptyasnull
blanksasnull
removequotes
escape;

Source: https://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html#r_COPY_command_examples-copy-data-with-the-escape-option

Upvotes: 4

Redshift Error 1202 &quot;Extra column(s) found&quot; using COPY command

Answers (6)

Related Questions

Redshift Error 1202 "Extra column(s) found" using COPY command