Reputation: 1903
I have a data.csv.gz file in S3 that contains special characters in some rows:
the file also has no headers but I've created columns names in the Postgres table. When I check under metadata in S3, the type is Content-Type: binary/octet-stream
This is the error I'm seeing:
psycopg2.errors.InternalError_: invalid byte sequence for encoding "UTF8": 0x8b
This is what I'm doing that's creating the error:
SELECT aws_s3.table_import_from_s3(
'btr.Ats_20210304',
'ID,NAME,WEBSITE,TYPE,CATEGORY,SUB_CATEGORY,PARENT_ACCOUNT',
'(FORMAT csv, HEADER true, DELIMITER ",")',
'vdw-dev',
'date/hourly/data_0_0_0.csv.gz',
'us-east-1');
I've checked the postgres table's encoding using SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = 'my_db';
and it's set to UTF8.
Upvotes: 5
Views: 2144
Reputation: 26689
See the documentation from AWS: Importing an Amazon S3 compressed (gzip) file
You need to ensure that the S3 file has the following Amazon S3 metadata:
Content-Encoding
gzip
Upvotes: 2