Reputation: 2249
This is the strangest error, and I don't even know where to start understanding what's wrong.
S3 has been working well, up until suddenly one day (yesterday) it strangely encodes any text file uploaded to strange characters. Whenever a text file has Å, Ä, Ö or any other UTF-8 comparable but none English characters, the text file is messed up. I've tried uploading using various clients, as well as the web interface of AWS. The upload goes well, then I download the file and it's messed up. I've tried downloading it to my Mac, I've tried downloading it onto a Raspberry with Linux on it. Same error.
Is there any encoding done by Amazons S3 servers?!
Upvotes: 42
Views: 64181
Reputation: 13529
Adding <meta charset="utf-8" />
in the <head>
of the .html files fixed the problem for me.
Upvotes: 4
Reputation: 861
For those who are using boto3 (python 3) to upload and are having strange characters instead of accentuation (such as in portuguese and french languages, for example), Toni Chaz's and Sony Kadavan's answers gave me the hint to fix. Adding ";charset=utf-8" to ContentType argument when calling put_object was enough to the accentuation be shown correctly.
content_type="text/plain;charset=utf-8"
bucket_obj.put_object(Key=key, Body=data, ContentType=content_type)
Upvotes: 8
Reputation: 5155
In my problem I got the problem with reading file form the filesystem as UFT8 too,, so I got the wrong file encoding in s3 until I have added
InputStreamReader isr = new InputStreamReader(fileInputStream, "UTF8");
instead of
InputStreamReader isr = new InputStreamReader(fileInputStream);
please note of this possible problem too
Upvotes: 1
Reputation: 741
I had the same problem and I solved it by adding charset=utf-8
in properties -> metadata of the file
Upvotes: 34
Reputation: 56
Not sure why, but the answer from Sony Kadavan didn't work in my case.
Rather than:
Content-Type: text/plain; charset=utf-8
I used:
Content-Type: text/html; charset=utf-8
Which seemed to work.
Upvotes: 3
Reputation: 4052
You can explicitly set the "Content-Type: text/plain; charset=utf-8", on the file in the S3 console.
This will tell S3 to serve as text.
Upvotes: 14
Reputation: 161
If your data includes non-ASCII multibyte characters (such as Chinese or Cyrillic characters)
, you must load the data to VARCHAR
columns. The VARCHAR
data type supports four-byte UTF-8 characters, but the CHAR
data type only accepts single-byte ASCII characters.
Source: http://docs.aws.amazon.com/redshift/latest/dg/t_loading_unicode_data.html
Upvotes: -8