rahul sorot
rahul sorot

Reputation: 81

getting Check-sum mismatch during data transfer between two different version of hadoop

I am new with hadoop.I am transfering data between hadoop 0.20 and hadoop 2.2.0 using distcp command. during transfer i am getting below error:

Check-sum mismatch between hftp://10.0.3.28:50070/hive/warehouse/staging_precall_cdr/operator=idea/PRECALL_CDR_Assam_OCT_JAN.csv and hdfs://10.0.20.118:9000/user/hive/warehouse/PRECALL_CDR_Assam_OCT_JAN.csv

I have used -skipcrccheck and -Ddfs.checksum.type=CRC32 also but did not get any solution. Solutions will be appreciated.

Upvotes: 4

Views: 8275

Answers (1)

SachinJose
SachinJose

Reputation: 8522

It looks like a known issue in Jira , copying data between 0.20 and 2.2.0 hadoop version https://issues.apache.org/jira/browse/HDFS-3054.

A workaround to this problem is to enable preserve block and check-sum in the distcp copying using -pbc.

hadoop distcp -pbc <SRC> <DEST>

OR

Use Skip CRC check using -skipcrccheck option

hadoop distcp -skipcrccheck -update <SRC> <DEST>

Upvotes: 4

Related Questions