lucky_start_izumi
lucky_start_izumi

Reputation: 2591

get hadoop ChecksumException: Checksum error

We are trying to copy files from local to hadoop. But occasionally get:

org.apache.hadoop.fs.ChecksumException: Checksum error: /crawler/twitcher/tmp/twitcher715632000093292278919867391792973804/Televisions_UK.20120912 at 0
    at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
    at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
    at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
    at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
    at java.io.DataInputStream.read(DataInputStream.java:83)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:66)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:45)
    at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:98)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:224)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1119)
    at mcompany.HadoopTransfer.copyToHadoop(HadoopTransfer.java:81)
    at mcompany.apps.Start.pushResultFileToSubfolder(Start.java:498)
    at mcompany.apps.Start.run(Start.java:299)
    at mcompany.apps.Start.main(Start.java:89)
    at mcompany.apps.scheduler.CrawlerJobRoutine.execute(CrawlerJobRoutine.java:15)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
    at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)

ERROR 2012-09-17 16:45:49,991 [amzn_mkpl_Worker-1] mcompany.apps.Start - Unable to push files to outbound location

The exception got when calling copyFromLocal file. If we delete the .crc file, it works fine. Could anyone give some suggestion about why there could be this crc issue? Thank you very much

Upvotes: 0

Views: 4019

Answers (1)

rystsov
rystsov

Reputation: 1928

You should check that the algorithm you are using for calculation of crc is comparable with HDFS's version.

Upvotes: 1

Related Questions