phils
phils

Reputation: 73246

Making git output full (un-abbreviated) hashes for all commands?

Question Update

(n.b. I've accepted Roland's answer, as it is indeed the correct (and simplest) solution starting from git 1.7.4.4, but please consider this question open regarding earlier versions of git down to 1.7.0.4.)

This question is a bit rambling (primarily due to the edits resulting from my subsequent attempts to establish more information on the situation), but the text in the title is the most important bit.

That is: I'm trying to establish the definitive way to ensure that all git commands will display full (un-abbreviated) hashes in their output.

As I am focussed on backwards-compatibility, this needs to cover older versions of git 1.7. Ideally the solutions would work for git 1.7.0.4 (which is used in the still-supported Ubuntu 10.04 LTS), but I'd be happy with a minimum of 1.7.2.5 (for Debian 6 / Squeeze LTS). Anything requiring a version later than 1.7.9.5 (Ubuntu 12.04 LTS) is definitely not ideal, but I'd still love to hear about them.

Please note that I do not wish to lose the ability to have abbreviated hashes -- the purpose behind this question is to ensure that tools interacting with git can always access a complete and unambiguous hash. When I use git manually on the command line I am going to want the normal abbreviations most of the time.

Roland Smith's suggestion of utilising a command-line argument override for core.abbrev looked ideal, but sadly only works since v1.7.4.4 (as core.abbrev did not previously exist). I suspect this means we do need to determine the most comprehensive set of command-specific arguments (such as git blame -l) to produce the equivalent effect.

Original Question with Edits

Some (most?) git commands default to outputting abbreviated hashes. For instance both git blame and git-annotate do this, and this fact was tripping up the current Emacs support when clashes arose (as they can do prior to git 1.7.11.1 -- see Edit 1 below), as the ambiguous hashes then caused errors when attempting to act upon them).


Begin Edit 1

I note the following in the Changelog, which suggests that the original problem which prompted this question issue would not arise in more recent versions of git.

Fixes since v1.7.11.1
---------------------
 * "git blame" did not try to make sure that the abbreviated commit
   object names in its output are unique.

If it's the case that git is supposed to guarantee uniqueness (at least at the time the command is run) for all object names reported by any git command, then that would significantly alleviate my concerns; but obviously a solution to the question which supports earlier versions of git is still going to be of interest.

End Edit 1


That can be fixed with git blame -l and git annotate -l, but I don't know whether these two commands are isolated cases or not, and I want to ensure that this issue can't arise in other situations.

The only related configurations I can see are core.abbrev:

Set the length object names are abbreviated to. If unspecified, many commands abbreviate to 7 hexdigits, which may not be enough for abbreviated object names to stay unique for sufficiently long time.

(but I don't want to remove the option of seeing an abbreviated commit), and log.abbrevCommit which:

If true, makes git-log(1), git-show(1), and git-whatchanged(1) assume --abbrev-commit. You may override this option with --no-abbrev-commit.

The --no-abbrev-commit argument isn't a consistent thing, though -- I presume that only the commands mentioned in that quote recognise it (but see Edit 2 below).


Begin Edit 2

The parse-options API document states:

Boolean long options can be negated (or unset) by prepending no-, e.g. --no-abbrev instead of --abbrev. Conversely, options that begin with no- can be negated by removing it.

So the commands which accept --abbrev (of which there are many) will in fact all accept --no-abbrev as well? This negated option is often not mentioned; although --abbrev=40 would currently be equivalent, of course, even if no negation was available).

It's not clear to me when the default boolean negation option feature was introduced, however.

In my version 1.7.9.5 git-blame --no-abbrev results in single-character object names. In fact it's the same as --abbrev=0, as blame uses n+1 characters. Conversely I notice that git branch -v --abbrev=0 gives the full 40 characters.

End Edit 2


A complete list of the potential problem commands with their appropriate options would be excellent, although the ideal solution would be something that would (or at least should) be respected by all git commands (including future commands), but maintains the ability to display abbreviated hashes when desired?

An ugly approach which occurred to me was to create a git config file which imports the original config file (although I note that importing is only available from 1.7.10) and then sets core.abbrev to 40; and to use this via a temporary GIT_CONFIG environment variable when invoking git, whenever full commits are a necessity. I guess this would work, but I'd rather not do it.

Clearly there are/were bugs, and some of the bugs at least have since been fixed; but as the aim is supporting as many (as practical) versions of git that a user might reasonably happen to be running, I'm looking for something which is backwards-compatible.

For what it's worth, here's what I've gleaned from grepping the manual for version 1.7.12.4:

Commands accepting --abbrev (and thus in theory also --no-abbrev):

Other options:

Upvotes: 13

Views: 1605

Answers (4)

VonC
VonC

Reputation: 1323175

The new updated answer (2021) will be with Git 2.31 (Q1 2021)

The configuration variable 'core.abbrev' can be set to 'no' to force no abbreviation regardless of the hash algorithm.

And that will be important when Git will switch from SHA1 to SHA2.

See commit a9ecaa0 (01 Sep 2020) by Eric Wong (ele828).
(Merged by Junio C Hamano -- gitster -- in commit 6dbbae1, 15 Jan 2021)

core.abbrev=no: disables abbreviations

Signed-off-by: Eric Wong

This allows users to write hash-agnostic scripts and configs by disabling abbreviations.

Using "-c core.abbrev=40" will be insufficient with SHA-256, and "-c core.abbrev=64" won't work with SHA-1 repos today.

[jc: tweaked implementation, added doc and a test]

git config now includes in its man page:

If set to "no", no abbreviation is made and the object names are shown in their full length.


Setting core.abbrev too early before the repository set-up (typically in "git clone"(man)) caused segfault, which has been corrected with Git 2.46 (Q3 2024), batch 15.

See commit 037df60, commit 59ff92c, commit 524c018 (12 Jun 2024) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 4401639, 20 Jun 2024)

config: fix segfault when parsing "core.abbrev" without repo

Reported-by: Kyle Lippincott
Signed-off-by: Patrick Steinhardt

The "core.abbrev" config allows the user to specify the minimum length when abbreviating object hashes.
Next to the values "auto" and "no", this config also accepts a concrete length that needs to be bigger or equal to the minimum length and smaller or equal to the hash algorithm's hex length.
While the former condition is trivial, the latter depends on the object format used by the current repository.
It is thus a variable upper boundary that may either be 40 (SHA-1) or 64 (SHA-256).

This has two major downsides.

  • First, the user that specifies this config must be aware of the object hashes that its repository use.
    If they want to configure the value globally, then they cannot pick any value in the range [41, 64] if they have any repository that uses SHA-1.
    If they did, Git would error out when parsing the config.

  • Second, and more importantly, parsing "core.abbrev" crashes when outside of a Git repository because we dereference the_hash_algo to figure out its hex length.
    Starting with c8aed5e ("repository: stop setting SHA1 as the default object hash", 2024-05-07, Git v2.46.0 -- merge listed in batch #9) though, we stopped initializing the_hash_algo outside of Git repositories.

Fix both of these issues by not making it an error anymore when the given length exceeds the hash length.
Instead, leave the abbreviated length intact.
repo_find_unique_abbrev_r() handles this just fine except for a performance penalty which we will fix in a subsequent commit.

Upvotes: 3

VonC
VonC

Reputation: 1323175

Note that with Git 2.12 (Q1 2017), you can add git diff --no-index to the list of commands with --no-abbrev:

See commit 43d1948 (06 Dec 2016) by Jack Bates (jablko).
(Merged by Junio C Hamano -- gitster -- in commit c89606f, 19 Dec 2016)

diff: handle --no-abbrev in no-index case

There are two different places where the --no-abbrev option is parsed, and two different places where SHA-1s are abbreviated.
We normally parse --no-abbrev with setup_revisions(), but in the no-index case, "git diff" calls diff_opt_parse() directly, and diff_opt_parse() didn't handle --no-abbrev until now. (It did handle --abbrev, however.)
We normally abbreviate SHA-1s with find_unique_abbrev(), but commit 4f03666 ("diff: handle sha1 abbreviations outside of repository, 2016-10-20) recently introduced a special case when you run "git diff" outside of a repository.

setup_revisions() does also call diff_opt_parse(), but not for --abbrev or --no-abbrev, which it handles itself.
setup_revisions() sets rev_info->abbrev, and later copies that to diff_options->abbrev. It handles --no-abbrev by setting abbrev to zero. (This change doesn't touch that.)

Setting abbrev to zero was broken in the outside-of-a-repository special case, which until now resulted in a truly zero-length SHA-1, rather than taking zero to mean do not abbreviate.
The only way to trigger this bug, however, was by running "git diff --raw" without either the --abbrev or --no-abbrev options, because

  1. without --raw it doesn't respect abbrev (which is bizarre, but has been that way forever),
  2. we silently clamp --abbrev=0 to MINIMUM_ABBREV, and
  3. --no-abbrev wasn't handled until now.

The outside-of-a-repository case is one of three no-index cases. The other two are when one of the files you're comparing is outside of the repository you're in, and the --no-index option.

Upvotes: 0

VonC
VonC

Reputation: 1323175

Note: using git -c core.abbrev=x rebase -i works well for the editor (which will show abbreviated commit SHA1)

BUT: it was also using that same abbreviated SHA1 internally for start of the rebase itself.

That won't be needed anymore (meaning git -c core.abbrev=40 rebase -i is not needed at all).

See commit edb72d5 from Kirill A. Shutemov, for Git 2.3.1+ (Q1/Q2 2015):

rebase -i: use full object name internally throughout the script

In earlier days, the abbreviated commit object name shown to the end users were generated with hardcoded --abbrev=7; commit 5689503 (rebase -i: respect core.abbrev, 2013-09-28, Git 1.8.5+) tried to make it honor the user specified core.abbrev, but it missed the very initial invocation of the editor.

These days, we try to use the full 40-hex object names internally to avoid ambiguity that can arise after rebase starts running.
Newly created objects during the rebase may share the same prefix with existing commits listed in the insn sheet.
These object names are shortened just before invoking the sequence editor to resent the insn sheet to the end user, and then expanded back to full object names when the editor returns.

But the code still used the shortened names when preparing the insn sheet for the very first time, resulting "7 hexdigits or more" output to the user.

Change the code to use full 40-hex commit object names from the very beginning to make things more uniform.

Note: for an interactive rebase, the "insn sheet" is the instruction sheet. See commit 3322ad4 for illustration.

Upvotes: 0

Roland Smith
Roland Smith

Reputation: 43495

Using git -c core.abbrev=40 <command> is supposed to work on all commands because it "will override whatever is defined in the config files".

It seems to have been introduced in 8b1fa778676ae94f7a6d4113fa90947b548154dd (landed in version 1.7.2).

Edit2: As phils noticed, the core.abbrev parameter was added in 1.7.4.4.

Edit: W.r.t. hardcoded hash lengths, you could always look up the hash lengths by looking at the filename lengths in .git/objects/* when initializing your program/library.

Upvotes: 15

Related Questions