Reputation: 5742
I downloaded a file and used md5sum to see if the download was successful without corruption. I got the following value:
a7099fcf9572d91b10d0073b07e112cb ./Macaca_mulatta.MMUL_1.70.dna.chromosome.1.fa.gz
But when I checked the website I downloaded the file from, it gave me the following value.
10256 63747 Macaca_mulatta.MMUL_1.70.dna.chromosome.1.fa.gz
What is this 10 digit code? is it not md5?
I downloaded the file from : ftp://ftp.ensembl.org/pub/release-70/fasta/macaca_mulatta/dna/CHECKSUMS
Upvotes: 9
Views: 3856
Reputation: 11040
They are not the same thing. MD5 is a checksum but there are other checksum algorithms that are not MD5, such as SHA, CRC etc.
Generally a checksum is a function that takes an input that's larger in size than its output and (it better) produces greatly different outputs even if one bit in the input is changed.
The output you're looking at consists of two 5-digit decimal numbers, so it's likely your checksum algorithm is CRC32. The unix sum
command may be used to calculate/verify it.
Upvotes: 7
Reputation: 146
Ensembl is using the unix 'sum' utilty to calcualte the CHECKSUM.gz file.
Here's more info about the program : http://en.wikipedia.org/wiki/Sum_%28Unix%29
To see if your download is correct, try:
sum Macaca_mulatta.MMUL_1.70.dna.chromosome.1.fa.gz
NOTE: It happened before that Ensembl did not update their CHECKSUM file so it can always happen that the download is correct but the CHECKSUM.gz file is incorrect.
Upvotes: 11
Reputation: 362
MD5 is a way to do a checksum, but there are others. CRC is one, so is SHA. All MD5 does is produce a hash code, and it is not the only algorithm to do so. I'm not sure what the 10 digit one is, but it can't be MD5.
Upvotes: 1