Artem
Artem

Reputation: 833

Special characters in AWS Athena show up as question marks

I've added a table in AWS Athena from a csv file, which uses special characters "æøå". These show up as � in the output. The csv file is encoded using unicode. I've also tried changing the encoding to UTF-8, with no luck. I've uploaded the csv in S3 and then added the table to Athena using the following DDL:

CREATE EXTERNAL TABLE `regions_dk`(
  `postnummer` string COMMENT 'from deserializer', 
  `kommuner` string COMMENT 'from deserializer', 
  `regioner` string COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
  'separatorChar'='\;') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://bucket/path'
TBLPROPERTIES (
  'classification'='csv')

I have another table which also includes the characters "æøå", which I added using an ETL script, and here there's no issue.

What am I overlooking?

Upvotes: 1

Views: 9903

Answers (1)

Rony Pacheco
Rony Pacheco

Reputation: 1

I uploaded an ANSI encoded file to S3, there was several unreadable data left, I changed the encoding of the file from the PC to UTF-8, I did the process again and everything was fine.

I did it with sublimetext.

Upvotes: 0

Related Questions