Reputation: 81
I am new with hadoop.I am transfering data between hadoop 0.20 and hadoop 2.2.0 using distcp command. during transfer i am getting below error:
Check-sum mismatch between hftp://10.0.3.28:50070/hive/warehouse/staging_precall_cdr/operator=idea/PRECALL_CDR_Assam_OCT_JAN.csv and hdfs://10.0.20.118:9000/user/hive/warehouse/PRECALL_CDR_Assam_OCT_JAN.csv
I have used -skipcrccheck
and -Ddfs.checksum.type=CRC32
also but did not get any solution.
Solutions will be appreciated.
Upvotes: 4
Views: 8275
Reputation: 8522
It looks like a known issue in Jira , copying data between 0.20 and 2.2.0 hadoop version https://issues.apache.org/jira/browse/HDFS-3054.
A workaround to this problem is to enable preserve block and check-sum in the distcp copying using -pbc.
hadoop distcp -pbc <SRC> <DEST>
OR
Use Skip CRC check using -skipcrccheck option
hadoop distcp -skipcrccheck -update <SRC> <DEST>
Upvotes: 4