git fsck --full only checking directories

Question

I'm serving bare git repos from my raspberry pi. My goal is to run git fsck --full nightly to detect file system issues early. I expect fsck to check both "object directories" and "objects", and to see output such as

pi@raspi2:/media/usb/git/dw.git $ git fsck --full
Checking object directories: 100% (256/256), done.
Checking objects: 100% (14538/14538), done.

For one of my repos, no objects are checked:

pi@raspi2:/media/usb/git/ts-ch.git.borken $ git --version
git version 2.11.0
pi@raspi2:/media/usb/git/ts-ch.git.borken $ git fsck --full
Checking object directories: 100% (256/256), done.
pi@raspi2:/media/usb/git/ts-ch.git.borken $

I modified one file under /objects (a 322kB .pdf file) and ran fsck again. It showed the same message as before, and no errors.

cd objects/86/
chmod u+w f3e6e674431ab3006cbb56fddecbdb4a7724b4 
echo "foosel" >> f3e6e674431ab3006cbb56fddecbdb4a7724b4 
chmod u-w f3e6e674431ab3006cbb56fddecbdb4a7724b4

All repos are the same, they are bare, and have no special config:

pi@raspi2:/media/usb/git/ts-ch.git $ git config --list
core.repositoryformatversion=0
core.filemode=true
core.bare=true

Am I missing something? Why is this modified object not detected? Its SHA1 should certainly not match anymore. Thanks for any hints!

John Szakmeister · Accepted Answer

On the corruption issue

Yes, you are missing something. Namely, you didn't corrupt the file in a way the Git pays attention to. Objects stored on disk generally start with the object type, followed by space, followed by the size (using ASCII numbers), followed by a NULL. The size states how big the object is and that's all that Git ends up reading. So tacking data to the end like that won't actually corrupt the object. If you replaced the contents of the file with something else, then you'd see the issue.

For reference, the object format details are in the Git User's Manual:

Object storage format

All objects have a statically determined "type" which identifies the format of the object (i.e. how it is used, and how it can refer to other objects). There are currently four different object types: "blob", "tree", "commit", and "tag".

Regardless of object type, all objects share the following characteristics: they are all deflated with zlib, and have a header that not only specifies their type, but also provides size information about the data in the object. It’s worth noting that the SHA-1 hash that is used to name the object is the hash of the original data plus this header, so sha1sum file does not match the object name for file.

As a result, the general consistency of an object can always be tested independently of the contents or the type of the object: all objects can be validated by verifying that (a) their hashes match the content of the file and (b) the object successfully inflates to a stream of bytes that forms a sequence of + + + + .

The structured objects can further have their structure and connectivity to other objects verified. This is generally done with the git fsck program, which generates a full dependency graph of all objects, and verifies their internal consistency (in addition to just verifying their superficial consistency through the hash).

However, there is an interesting interaction that leads me to think that git fsck should be working harder and noticing when the file has garbage at the end. If you attempt to run git gc on that repo, you'll end up see an error like this:

:: git gc
Counting objects: 9, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
error: garbage at end of loose object '45b983be36b73c0788dc9cbcb76cbb80fc7bb057'
fatal: loose object 45b983be36b73c0788dc9cbcb76cbb80fc7bb057 (stored in .git/objects/45/b983be36b73c0788dc9cbcb76cbb80fc7bb057) is corrupt
error: failed to run repack

It seems like if git gc can't actually run, then git fsck should be catching the issue.

On why you don't see "Checking objects"

This issue is actually really simple: there are no packed objects to check. Those live in .git/objects/pack. If you don't have any of those files, then you won't see the "Checking objects" bit.

git fsck --full only checking directories

Answers (2)

On the corruption issue

Object storage format

On why you don't see "Checking objects"

"`git fsck`" inspects loose objects more carefully now.

"`git fsck --connectivity-check`" was not working at all.

`check_stream_sha1()`: handle input underflow

`fsck`: always compute USED flags for unreachable objects

Related Questions

git fsck --full only checking directories

Answers (2)

On the corruption issue

Object storage format

On why you don't see "Checking objects"

"git fsck" inspects loose objects more carefully now.

"git fsck --connectivity-check" was not working at all.

check_stream_sha1(): handle input underflow

fsck: always compute USED flags for unreachable objects

Related Questions

"`git fsck`" inspects loose objects more carefully now.

"`git fsck --connectivity-check`" was not working at all.

`check_stream_sha1()`: handle input underflow

`fsck`: always compute USED flags for unreachable objects