KristiLuna
KristiLuna

Reputation: 1903

ingesting gzip file from s3 to Postgres invalid byte sequence for encoding "UTF8"

I have a data.csv.gz file in S3 that contains special characters in some rows: ( the file also has no headers but I've created columns names in the Postgres table. When I check under metadata in S3, the type is Content-Type: binary/octet-stream

This is the error I'm seeing:

psycopg2.errors.InternalError_: invalid byte sequence for encoding "UTF8": 0x8b

This is what I'm doing that's creating the error:

SELECT aws_s3.table_import_from_s3(
'btr.Ats_20210304',
'ID,NAME,WEBSITE,TYPE,CATEGORY,SUB_CATEGORY,PARENT_ACCOUNT',
'(FORMAT csv, HEADER true, DELIMITER ",")',
'vdw-dev',
'date/hourly/data_0_0_0.csv.gz',
'us-east-1');

I've checked the postgres table's encoding using SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = 'my_db'; and it's set to UTF8.

Upvotes: 5

Views: 2144

Answers (1)

Datageek
Datageek

Reputation: 26689

See the documentation from AWS: Importing an Amazon S3 compressed (gzip) file

You need to ensure that the S3 file has the following Amazon S3 metadata:

  • Key: Content-Encoding
  • Value: gzip

Upvotes: 2

Related Questions