Reputation: 81
I am using Cloudera Sqoop to fetch data from Oracle database to HDFS. Everything is going fine except for some characters like ® and © which are being converted to ®© in HDFS. (However in Oracle the data is stored without any problems). Is there any way I can store these characters in HDFS as it is?
Sqoop Version: 1.3
Thanks, Karthikeya
Upvotes: 3
Views: 595
Reputation: 1726
I would strongly suggest to check the actual bytes on HDFS rather than looking at the representation. I've seen too many cases where the data were stored just fine (and actually converted into UTF8 by Sqoop automatically) and just the representation/terminal emulator/whatever else used for reading the data was messing with the encoding. Download the file from HDFS and simply hexdump -C
it to verify if the encoding is indeed broken.
Upvotes: 1
Reputation: 1055
Which format of characters are you use in Oracle database? Because Hadoop use UTF-8 format, you should convert the data form Oracle database if they are different.
Upvotes: 1