duxsco
duxsco

Reputation: 361

Create checksum of large sparse image in linux

I have several sparse images on my linux server (320G total size; 111G used size) and would like to get a checksum of these every night. I was wondering whether there is an efficient way to create the checksum. If I do the following, the checksum creation takes a lot of time:

~ # dd bs=1 count=0 seek=5G if=/dev/zero of=sparse.img
0+0 Datensätze ein
0+0 Datensätze aus
0 Bytes (0 B) kopiert, 0,00036461 s, 0,0 kB/s
~ # du -hs sparse.img
0   sparse.img
~ # time sha512sum sparse.img
e4f21997407b9cb0df347f6eba2...  sparse.img
real    0m55.339s
user    0m52.010s
sys     0m2.790s

Upvotes: 3

Views: 1210

Answers (2)

koo5
koo5

Reputation: 485

There has been a good solution since 2016: starting with version 1.29, GNU tar has: If possible, use SEEK_DATA/SEEK_HOLE to detect sparse files. Detecting sparse files in general is enabled by passing --sparse, so for example: tar -c --sparse <file name> | md5sum gives you a repeatable way to md5sum your file, and only reads the file once.

Upvotes: 1

There isn't really a good solution for this, because (a) all those zeroes are still part of the content of the file, even though no storage is allocated for them, and (b) there doesn't seem to be any tools designed to manipulate sparse files. Admittedly, GNU tar (and various other backup/archive products) can be told to handle sparse files, but I've never seen one that actually queries the filesystem for an allocation map. The documentation for GNU tar, for instance, clearly indicates that it handles sparse files by explicitly searching the file content for runs of zeroes (and that it does this as a preprocess step, not inline, with the effect that files being archived are read twice.)

Ideally, there would be a way to access the allocated blocks of a file directly. I can think of several ways to implement such a thing, but it would have to have been proposed and specified at least ten years ago for it to be useful at this point.

Unfortunately, there doesn't seem to be a good answer to your question. I can only suggest that you record whatever checksums your backup system gives you, and use that to verify the backup media before a restore, but you're probably already doing that.

Upvotes: 2

Related Questions