Alby
Alby

Reputation: 5742

checksum and md5, not the same thing?

I downloaded a file and used md5sum to see if the download was successful without corruption. I got the following value:

 a7099fcf9572d91b10d0073b07e112cb  ./Macaca_mulatta.MMUL_1.70.dna.chromosome.1.fa.gz

But when I checked the website I downloaded the file from, it gave me the following value.

10256 63747 Macaca_mulatta.MMUL_1.70.dna.chromosome.1.fa.gz

What is this 10 digit code? is it not md5?

I downloaded the file from : ftp://ftp.ensembl.org/pub/release-70/fasta/macaca_mulatta/dna/CHECKSUMS

Upvotes: 9

Views: 3856

Answers (3)

Sten Petrov
Sten Petrov

Reputation: 11040

They are not the same thing. MD5 is a checksum but there are other checksum algorithms that are not MD5, such as SHA, CRC etc.

Generally a checksum is a function that takes an input that's larger in size than its output and (it better) produces greatly different outputs even if one bit in the input is changed.

The output you're looking at consists of two 5-digit decimal numbers, so it's likely your checksum algorithm is CRC32. The unix sum command may be used to calculate/verify it.

Upvotes: 7

tweep
tweep

Reputation: 146

Ensembl is using the unix 'sum' utilty to calcualte the CHECKSUM.gz file.

Here's more info about the program : http://en.wikipedia.org/wiki/Sum_%28Unix%29

To see if your download is correct, try:

sum Macaca_mulatta.MMUL_1.70.dna.chromosome.1.fa.gz

NOTE: It happened before that Ensembl did not update their CHECKSUM file so it can always happen that the download is correct but the CHECKSUM.gz file is incorrect.

Upvotes: 11

ahouse101
ahouse101

Reputation: 362

MD5 is a way to do a checksum, but there are others. CRC is one, so is SHA. All MD5 does is produce a hash code, and it is not the only algorithm to do so. I'm not sure what the 10 digit one is, but it can't be MD5.

Upvotes: 1

Related Questions