Karthikeya
Karthikeya

Reputation: 81

Issue while loading data containing characters like ® and © from Oracle to HDFS - Hadoop Distributed File System

I am using Cloudera Sqoop to fetch data from Oracle database to HDFS. Everything is going fine except for some characters like ® and © which are being converted to ®© in HDFS. (However in Oracle the data is stored without any problems). Is there any way I can store these characters in HDFS as it is?

Sqoop Version: 1.3

Thanks, Karthikeya

Upvotes: 3

Views: 595

Answers (2)

Jarek Jarcec Cecho
Jarek Jarcec Cecho

Reputation: 1726

I would strongly suggest to check the actual bytes on HDFS rather than looking at the representation. I've seen too many cases where the data were stored just fine (and actually converted into UTF8 by Sqoop automatically) and just the representation/terminal emulator/whatever else used for reading the data was messing with the encoding. Download the file from HDFS and simply hexdump -C it to verify if the encoding is indeed broken.

Upvotes: 1

haosdent
haosdent

Reputation: 1055

Which format of characters are you use in Oracle database? Because Hadoop use UTF-8 format, you should convert the data form Oracle database if they are different.

Upvotes: 1

Related Questions