Mitar
Mitar

Reputation: 7030

What is git reflog identity?

In git, I see that when using custom formatting for git log, there are "reflog identity" values possible. What is a reflog identity?

Upvotes: 3

Views: 3138

Answers (3)

VonC
VonC

Reputation: 1323963

Note the git reflog identity/entry is about to change (2020, 4 years later)

With Git 2.29 (Q4 2020), Preliminary clean-up of the refs API in preparation for adding a new refs backend "reftable".

See commit 523fa69 (10 Jul 2020) by Junio C Hamano (gitster).
See commit de966e3, commit ce57d85 (10 Jul 2020), and commit 9e35a6a (30 Jun 2020) by Han-Wen Nienhuys (hanwen).
(Merged by Junio C Hamano -- gitster -- in commit 3161cc6, 30 Jul 2020)

reflog: cleanse messages in the refs.c layer

Signed-off-by: Han-Wen Nienhuys

Regarding reflog messages:

  • We expect that a reflog message consists of a single line.
    The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g"(man) may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message.
  • We however allow callers of refs API to supply a random sequence of NUL terminated bytes.
    We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message.
    This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both).

Currently, the cleansing of the reflog message is done by the files backend, before the log is written out.
This is sufficient with the current code, as that is the only backend that writes reflogs.
But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so.

An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer.

Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message AND append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record.
The reading side can detect the presence of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message.


With Git 2.29 (Q4 2020), git reflog manages the \t for its entries.

See commit 25429fe (31 Jul 2020) by Han-Wen Nienhuys (hanwen).
(Merged by Junio C Hamano -- gitster -- in commit dc3c6fb, 01 Aug 2020)

refs: move the logic to add \t to reflog to the files backend

Signed-off-by: Han-Wen Nienhuys

523fa69c ("reflog: cleanse messages in the refs.c layer", 2020-07-10, Git v2.29.0 -- merge) centralized reflog normalizaton.
However, the normalizaton added a leading "\t" to the message.
This is an artifact of the reflog storage format in the files backend, so it should be added there.

Routines that parse back the reflog (such as grab_nth_branch_switch) expect the "\t" to not be in the message, so without this fix, git reflog(man) with reftable cannot process the "@{-1}" syntax.


Git 2.46 (Q3 2024), batch 9 adds more detail about that new reftable backend where reflog identity is written: the knobs to tweak how reftable files are written have been made available as configuration variables.

See commit f518d91, commit f663d34, commit afbdbfa, commit 90db611, commit 8e9e136, commit 831b366, commit fcf3418, commit c22d75b, commit e0cf3d8, commit 7992378, commit 4d35bb2 (13 May 2024) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 23528d3, 30 May 2024)

refs/reftable: allow disabling writing the object index

Signed-off-by: Patrick Steinhardt

Besides the expected "ref" and "log" records, the reftable library also writes "obj" records.
These are basically a reverse mapping of object IDs to their respective ref records so that it becomes efficient to figure out which references point to a specific object.
The motivation for this data structure is the "uploadpack.allowTipSHA1InWant" config, which allows a client to fetch any object by its hash that has a ref pointing to it.

This reverse index is not used by Git at all though, and the expectation is that most hosters nowadays use "uploadpack.allowAnySHA1InWant".
It may thus be preferable for many users to disable writing these optional object indices altogether to safe some precious disk space.

Add a new config "reftable.indexObjects" that allows the user to disable the object index altogether.

git config now includes in its man page:

reftable.blockSize

The size in bytes used by the reftable backend when writing blocks.
The block size is determined by the writer, and does not have to be a power of 2. The block size must be larger than the longest reference name or log entry used in the repository, as references cannot span blocks.

Powers of two that are friendly to the virtual memory system or filesystem (such as 4kB or 8kB) are recommended. Larger sizes (64kB) can yield better compression, with a possible increased cost incurred by readers during access.

The largest block size is 16777215 bytes (15.99 MiB).
The default value is 4096 bytes (4kB). A value of 0 will use the default value.

reftable.restartInterval

The interval at which to create restart points. The reftable backend determines the restart points at file creation. Every 16 may be more suitable for smaller block sizes (4k or 8k), every 64 for larger block sizes (64k).

More frequent restart points reduces prefix compression and increases space consumed by the restart table, both of which increase file size.

Less frequent restart points makes prefix compression more effective, decreasing overall file size, with increased penalties for readers walking through more records after the binary search step.

A maximum of 65535 restart points per block is supported.

The default value is to create restart points every 16 records. A value of 0 will use the default value.

reftable.indexObjects

Whether the reftable backend shall write object blocks. Object blocks are a reverse mapping of object ID to the references pointing to them.
The default value is true.

reftable.geometricFactor

Whenever the reftable backend appends a new table to the stack, it performs auto compaction to ensure that there is only a handful of tables.
The backend does this by ensuring that tables form a geometric sequence regarding the respective sizes of each table.

By default, the geometric sequence uses a factor of 2, meaning that for any table, the next-biggest table must at least be twice as big. A maximum factor of 256 is supported.

Upvotes: 0

torek
torek

Reputation: 488103

These refer to reflog entries. A reflog is simply a record of updates to a reference, and a reference itself is simply a generalization of branch and tag names and special names like HEAD.

Reflogs are normally enabled on client repositories (like yours) and normally disabled on server repositories. This is, naturally enough, configurable. The front end command people mostly use for looking at their reflogs is git reflog. You can run that now if you like, but doing so won't help explain %ge and so on. So we'll do something different: Run git log -g.

Running git reflog basically runs git log --oneline -g. By running git log -g yourself, you can leave out the --oneline, and hence see more than one line for each reflog entry.

The output will resemble the following, with names and email addresses changed:

commit 08b876daae9944d1a6fba271cfcd9629c13dfd69
Reflog: HEAD@{0} (A U Thor <[email protected]>)
Reflog message: commit: initial torturetest code
Author: A U Thor <[email protected]>
Date:   Sun Aug 7 01:59:31 2016 -0700

    initial torturetest code

commit 8bb118938b5c6a2978f13e74525b594a48226571
Reflog: HEAD@{1} (A U Thor <[email protected]>)
Reflog message: checkout: moving from master to torturetest
Author: Someone Else <[email protected]>
Date:   Sat Jul 16 02:00:46 2016 +0200

    Allow backend ...

The most recent commit is one I made last night (well, this morning). This is HEAD@{0}. It represents some commit (whose true name is the big ugly SHA-1 hash starting with 08b87...). The commit itself has an author (me, though I changed the name here for display purposes), date, commit message, and so on—but the reflog entry, HEAD@{0}, also has an author (me again), date, and message.

In this case, the commit's author and the reflog author are the same. Even the reflog message is basically the same as the commit subject (the Reflog message: line just as the word commit: inserted). So that's not much help—but take a look at the very next example, commit 8bb11....

This reflog entry has me as the reflog author, and someone else as the commit author.1 Moreover, the reflog message, checkout: moving from master to torturetest, is completely unrelated to the commit's subject line, which begins with Allow backend.

If you compare this to the short output from git log -g --oneline or git reflog—both of these examine the reflog for HEAD—you'll see only the reflog message, along with the commit ID and the reflog selector.

One other thing is worth noting here. In regular git log output, each commit normally2 appears only once. In git log -g output, however, a commit can appear repeatedly, because Git is looking at the hash IDs stored in the reflog itself. If you switch back and forth between branches that point to the same commit, or use git reset to change a branch to point back to a commit it pointed-to earlier, or run git rebase, or do any number of similar things, you can easily get a reference—this applies to both HEAD and branch names—that points to the same commit in multiple different reflog entries.

In my case, for instance, I apparently vacillated a bit on the name torturetest or something:

08b876d HEAD@{0}: commit: initial torturetest code
8bb1189 HEAD@{1}: checkout: moving from master to torturetest
8bb1189 HEAD@{2}: checkout: moving from torturetest to master

(I'm not really sure what this was about—perhaps just running too many Git commands without remembering which repository I was in. :-) )

Returning directly to your question:

What is a reflog identity?

These are the names and email addresses stored in each reflog entry. In the case of a private Git repository, on your own client, these are likely to all be the same all the time. But since you can run git config --global user.name "New User Name" and git config --global user.email new@address any time to change them,3 they could vary.


1That someone else is also the committer, if you get to wondering. The commit's author and committer, and their corresponding dates and email addresses, are stored in the commit itself. The reflog author, date, and email address are stored in the reflog entry. It's actually a plain text file today, so you can just look at .git/logs/HEAD and .git/logs/refs/heads/master to see the raw reflog data. The format is not particularly well documented, but is pretty obvious: it has the old and new values for the reference; the reflog's author, email, and date-stamp information; and the reflog message.

2The exception here, beside the one for reflogs themselves of course, occurs when using git log -m -p to split merge commits. Normally git log skips merge commits entirely, while git show shows combined diffs for them. (The documentation on combined diffs is somewhat buried—search here on StackOverflow for the term "combined diffs".)

If you convince git log to show a diff, it too can show a combined diff. In all cases, combined diffs may omit crucial information, so you can tell these commands to do something different: for each parent of a merge, produce a diff of the merge commit's tree against that particular parent's tree. This is what the -m flag does.

When showing a diff of commit merge commit 1234567... against parent #1, Git shows you the merge commit information, then the diff. Then, when showing a diff of merge commit 1234567... against parent #2, Git shows you the merge commit information again, before the second diff. So this is how git log can show the same commit more than once.

3You can also use git -c user.name=whatever and git -c user.email=whatever, or in this particular case, special Git environment variables. Using git -c is especially convenient for one-off tests, as in the answer I wrote recently about Git diff color options.

Upvotes: 4

CodeWizard
CodeWizard

Reputation: 142064

git reflog is another command.

Everytime the HEAD is changed, git store its old value in its .git/log folder and you can view it vi the git reflog command.

The meanning of the "reflog identity" is simply:

Each commit will be grouped by author and title

Upvotes: 0

Related Questions