plugwash
plugwash

Reputation: 10514

under what circumstances will git replacements be pushed/pulled

I need to make some local tweaks to history to make merges go more smoothly, traditionally the tool for doing this is grafts, however git is yelling at me that grafts are deprecated and will be removed in a future release and that I should use git-replace instead.

However there is something that is concerning me, https://git.wiki.kernel.org/index.php/GraftPoint says

As of Git 1.6.5, the more flexible git replace has been added, which allows you to replace any object with any other object, and tracks the associations via refs which can be pushed and pulled between repos.

However the git-replace man page makes no mention of push or pull.

under what circumstances will replacements be pushed or pulled, and what if any steps do I need to take to ensure that my replacements stay local.

Upvotes: 4

Views: 568

Answers (2)

torek
torek

Reputation: 489143

The answer is really two or three parts, depending on how you want to subdivide them:

  • How replacements work via the refs/replace/ namespace
  • How refs can be transferred
  • How refs are transferred by default

I'll repeat this sentence from the bottom of the long section here, before launching into the long answer:

You can either add refs/replace/*:refs/replace/*—perhaps with an additional leading +—to your remote.origin.fetch lines, to get replacements fetched by default, or do a one-time command-line fetch with a refspec.

This is for git fetch only. You can do a one-time command-line git push of a newly created replacement ref to send the replacement "upstream" to origin or any other upstream repository.

Reference namespaces and how replacements work

A Git replacement object doesn't actually replace the other object, but rather sort of augment it. That is, in Git, every object—commit, tree, blob, or annotated tag—has a unique hash ID. The hash ID is the "true name" of the underlying object.1 Internally, Git starts with an object ID—or gets one from somewhere, anyway—and uses the OID to find the underlying object.

Meanwhile, refs or references are mostly2 any string that (a) starts with refs/ and (b) meets the constraints imposed by git check-ref-format. The next word, with words being separated by slashes, in the refs/namespace/and/more/text string, defines the namespace of this ref. If the word is heads, the ref lives in the branch names namespace. If the word is tags, the ref lives in the tags namespace, and so on.

A valid-and-sensible replacement ref starts with refs/replace/. This is the fixed part of a replacement ref. The remainder of the ref—the variable part—is the hash ID (OID) of the original Git object. So if you have a commit whose hash ID is a12345678901234567899012345678990123456789 and you want to have a replacement ref for that commit, you'd create a ref with that string prefixed by refs/replace.

Now, each ref should itself contain one valid Git hash ID. (Some refs, such as branch names, are constrained to contain only commit hash IDs. Others, such as tags, may contain the hash ID of any valid Git object.) A replacement should contain the valid hash ID of an object of the same type as the object whose name makes up the variable part of the replacement ref. So in this case, with refs/replace/a12345678901234567899012345678990123456789 replacing a commit, it should contain the hash ID of some other commit.

That other commit is the replacement for the original commit. When Git is about to look up any object by hash ID, Git can first check to see if some refs/replace/* ref exists with that OID as its name's variable part. If so, that bit of Git code will instead look up the object whose hash ID is stored in the replacement ref. In other words, there's code that resembles:

lookup(oid: string, allow_replacements: bool): internalObject {
    replacer = "refs/replace/" + oid
    if allow_replacements and exists_as_ref(replacer) {
        oid = read_ref(replacer)
    }
    return lookup_noreplace(oid)
}

(where lookup_noreplace is what we might think of when we think about lookup: it finds the actual object and returns some sort of handle for it).

Hence, replacements work by the existence of references (or refs) whose name is in the refs/replace/ namespace. Git works along more or less as normal, then at the last moment, veers off and fetches the replacement object instead of the original object. You can force Git not to do this (git --no-replace-objects ...), and various commands such as git fsck and git gc internally avoid doing this as appropriate for correctness, but most Git commands do allow replacements.


1This is going to get ... interesting when the Great Hash Function Changeover happens.

2Special names like HEAD, CHERRY_PICK_HEAD, ORIG_HEAD, and so on are sometimes treated as references, and sometimes not. Obviously these do not start with refs/.


git fetch and git push transfer objects and set refs

Both git fetch and git push work by having one Git call up another Git. Each Git has its own references, private to each Git. They can share objects, by making copies of each other's objects; they will use the same OIDs for the same objects. (See footnote 1 again.)

The protocol details differ slightly for fetch and push, but in general, the sender kicks the whole thing off by offering up some name(s) and OID(s): one OID per name, since this is how refs work. The receiver can pick over the names and OIDs if desired. In general, the receiver looks at one or both, depending on the situation. If the receiving Git wants the underlying object, it says so. If the receiving Git already has that object, it says so. If the receiving Git wants an object, and the object is of some type that refers to additional objects—not a blob object, in other words—the sender should then offer the referred-to objects as well.3 In this way, the sender and receiver can agree on a minimal subset of objects that the sender must send. The sender then builds a so-called thin pack:4 you'll see "delta compressing" messages here. The receiver collects it and fixes it up or takes the objects out of it as needed, and puts the resulting pack and/or objects into its repository.

Now that the receiver has the necessary objects, such as new commits with new files / content, the receiver needs to set some names to remember these objects. This is where refs return to the picture.

If the receiver is a Git running git fetch, normally, this receiver sets remote-tracking names, such as refs/remotes/origin/master, to the same hash ID as the branch names it got from the sender. The receiver may or may not take all tags or some tags, using rather complicated rules. The receiver ignores all the other refs, including the refs/replace/ ones.

If the sender is a Git running git push, whoever runs git push chooses which names to send. The git push syntax allows the sender to choose an arbitrary name on the receiver; it's up to the receiver to accept or reject this name. The default is to use the same name, and to push either the current branch, or some set of branches, depending on Git vintage and push.default settings.

If you, as the user of the computer, are running git fetch or git push yourself, you can supply all of the refs involved on both sides of the operation here, using what Git calls refspecs.

Using and setting refspecs

The full form of a refspec is:

  • an optional leading plus sign +,
  • a source ref,
  • a colon :, and
  • a destination ref.

The leading plus sign sets the force flag for this particular refspec: it asks whichever Git is receiving to set its (destination) ref even if this would violate the usual rules for this ref, whatever those usual rules may be. Using git fetch --force or git push --force sets this same flag on every refspec even if it's not there in the refspec itself.

The source ref is the ref that the sender uses to find the OID to send. The destination ref is the name the receiver uses. If you're running git fetch, you're setting your own Git's receive-side name; if you're running git push, you're setting the name your Git hands to their Git.

More precisely:

  • When you run git fetch, they send you their names-and-OIDs at the start of the session, and your Git combs through those to find matching names you used on your command line refspecs, as sources. Then your Git receives, and then your Git writes to your names, using the destination refs you gave on the command line, in your refspecs.

  • When you run git push, you choose which OIDs to send to their Git using the source names in the refspecs you give, and you choose what names to give to their Git using the destination names in the refspecs you give.

If you don't give refspecs, your Git computes some default ones. Again, this varies for git fetch and git push:

  • With git fetch, your Git looks up the remote name you used, such as origin. Under this remote, there should be one or more settings for remote.remote.fetch The (single) standard setting for origin is:

     remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
    

This refspec is why their branch names become your remote-tracking names.

  • With git push, the default refspec is based on push.default. The default value of push.default is different in Git versions predating 2.0, where it defaults to matching, than in 2.0+, where it defaults to simple. With current or simple, the refspec itself is name:name where name is the current branch name. However, simple first checks that name has an upstream set. If not, the push operation fails. Otherwise, it checks that the upstream name is the same as the branch name. If not, the push operation fails.

Note that none of these built in defaults ever match refs/replace/*. However:

  • When fetching to a mirror clone (a fetch mirror), the default refspec is instead:

     +refs/*:refs/*
    

This does match refs/replace/*, so replacement objects get copied. Their ref names are unchanged, so they act as replacements in the usual way.

  • When using a push mirror—these are specialized and even rarer than fetch mirrors—you would also push with refs/*:refs/* (perhaps with a leading +). So this would also push replacement objects, maintaining their replacement-ness.

You can either add refs/replace/*:refs/replace/*—perhaps with an additional leading +—to your remote.origin.fetch lines, to get replacements fetched by default, or do a one-time command-line fetch with a refspec.


3For practical reasons, there are a bunch of short-cuts taken here with tree objects, so that the full graph analysis isn't done in most cases. This usually leads to better performance but can result in a sender over-sending. If Git used the obvious graph walk all the way down, Git could do a more perfect job here, at the expensive of exchanging a lot more OIDs in the slower initial offer/have/want style phase. That's usually not a good tradeoff, which is why the short-cuts are used. But in principle it works like this, at least.

4A thin pack is one that is delta-compressed against objects that are not in the pack itself. The sender knows which commits (and thus which blob objects, i.e., file contents) the receiver has, at this point, so the sender can delta-compress to-be-sent objects against those objects that the receiver already has, without sending those objects. That's a "thin pack".

Upvotes: 3

plugwash
plugwash

Reputation: 10514

I also posted this question on the git mailing list and got an answer there. It's a bit long to quote in full, but the gist of it is that replace references are local-only by default and special steps need to be taken to share them.

https://marc.info/?l=git&m=157893765418386&w=2

Upvotes: 0

Related Questions