Paolo
Paolo

Reputation: 2249

Text files uploaded to S3 are encoded strangely?

This is the strangest error, and I don't even know where to start understanding what's wrong.

S3 has been working well, up until suddenly one day (yesterday) it strangely encodes any text file uploaded to strange characters. Whenever a text file has Å, Ä, Ö or any other UTF-8 comparable but none English characters, the text file is messed up. I've tried uploading using various clients, as well as the web interface of AWS. The upload goes well, then I download the file and it's messed up. I've tried downloading it to my Mac, I've tried downloading it onto a Raspberry with Linux on it. Same error.

Is there any encoding done by Amazons S3 servers?!

Upvotes: 42

Views: 64181

Answers (7)

Giorgio
Giorgio

Reputation: 13529

Adding <meta charset="utf-8" /> in the <head> of the .html files fixed the problem for me.

Upvotes: 4

Raphael Fernandes
Raphael Fernandes

Reputation: 861

For those who are using boto3 (python 3) to upload and are having strange characters instead of accentuation (such as in portuguese and french languages, for example), Toni Chaz's and Sony Kadavan's answers gave me the hint to fix. Adding ";charset=utf-8" to ContentType argument when calling put_object was enough to the accentuation be shown correctly.

content_type="text/plain;charset=utf-8"
bucket_obj.put_object(Key=key, Body=data, ContentType=content_type)

Upvotes: 8

Eljah
Eljah

Reputation: 5155

In my problem I got the problem with reading file form the filesystem as UFT8 too,, so I got the wrong file encoding in s3 until I have added

InputStreamReader isr = new InputStreamReader(fileInputStream, "UTF8");

instead of

InputStreamReader isr = new InputStreamReader(fileInputStream);

please note of this possible problem too

Upvotes: 1

Toni Chaz
Toni Chaz

Reputation: 741

I had the same problem and I solved it by adding charset=utf-8 in properties -> metadata of the file

enter image description here

Upvotes: 34

rmacinnis
rmacinnis

Reputation: 56

Not sure why, but the answer from Sony Kadavan didn't work in my case.

Rather than:

Content-Type: text/plain; charset=utf-8

I used:

Content-Type: text/html; charset=utf-8

Which seemed to work.

Upvotes: 3

Sony Kadavan
Sony Kadavan

Reputation: 4052

You can explicitly set the "Content-Type: text/plain; charset=utf-8", on the file in the S3 console.

This will tell S3 to serve as text.

Upvotes: 14

oozmac
oozmac

Reputation: 161

If your data includes non-ASCII multibyte characters (such as Chinese or Cyrillic characters), you must load the data to VARCHAR columns. The VARCHAR data type supports four-byte UTF-8 characters, but the CHAR data type only accepts single-byte ASCII characters.

Source: http://docs.aws.amazon.com/redshift/latest/dg/t_loading_unicode_data.html

Upvotes: -8

Related Questions