Mike Snare
Mike Snare

Reputation: 423

If a git branch is pushed to the remote and then deleted before it is merged, is it still part of the repository history?

A branch has been created and pushed to the remote, and a merge request has been opened for it. At some point, it is decided to make significant enough changes that simply starting over in a new branch may be the easiest approach.

When the original branch is deleted from the remote, does any of its history linger in the remote repository, or is it (and any commits associated with it) completely removed from the remote?

Stated more succinctly, If I delete a remote branch without ever merging it to the main branch and someone does a git clone of the repo, will they end up downloading any sort of history information about that now-deleted branch, or would the clone by 'clean' w.r.t. this branch?

Upvotes: 0

Views: 119

Answers (2)

torek
torek

Reputation: 489045

Stated more succinctly, If I delete a remote branch without ever merging it to the main branch and someone does a git clone of the repo, will they end up downloading any sort of history information about that now-deleted branch, or would the clone by 'clean' w.r.t. this branch?

It's really quite hard to say for certain. In principle, someone cloning the repository should not get any objects that are not reachable from any of the reference-names they clone. In practice, certain server side short-cuts might allow "data leakage".

To make this entirely concrete with examples, consider a repository R on some remote host you are calling origin. R itself had just a master branch with commit hash 1234567.... You then did git push origin newbranch, adding some objects (including at least one new commit) to R and getting the Git at origin to add the new branch newbranch with hash ID fedcba9.... Then you said: "oops, no, we don't want that" and did git push origin :newbranch to delete the name newbranch.

Provided there are no reflogs on the server, commit fedcba9... and its objects (and any earlier unreachable commits) are now eligible for garbage collection, as Ryan noted in a comment. If these all do get garbage-collected, they are truly gone and cannot be observed by any normal means.1

If not, though, they are still in R, they just have no name by which they can be found. Or, if the commit hash or hashes are saved in reflogs, they do have names, but these are reflog names, not a regular reference. The git clone and git fetch processes do not obtain reflogs and hence cannot "see" the hash ID, but until the reflog entries expire, commit fedcba9... and related objects are not yet eligible for garbage collection.

Now we move on to the git clone process. When you run:

git clone <options> <url>

you get your Git to call up another Git at the given URL. Your Git then requests, from that Git, a list of all its normal (non-reflog) references. For repo R, that's just refs/heads/master = 1234567..., since refs/heads/newbranch is gone. Your Git then requests some (if --depth shallow clone) or all objects reachable from some or all of these names like master. Since fedcba9... was never reachable from master, their Git should not send it.

There is, however, the option of doing a short-cut on the server. The server has these "pack" files that contain every object that is not a loose object. If there's just one pack file with everything, and no loose objects ... well: If the server could just send you that one "pack" file that has everything, why, that would save the server a lot of time and effort it would otherwise spend building a new, customized pack file that has just the objects you want. If you've asked for all the refs, you must want everything. There's just the one pack and no loose objects; that's everything. There's no reason to spend all that time and effort building a custom pack, right? :-)

So, if the server does this, and if that one big pack file happens to have commit fedcba9... and its related objects in it, then (and only then) you would get a pack file that has fedcba9... and its related objects. It would then be up to your Git to repack this and throw out the unwanted objects.

Does this ever happen in practice? I have seen something like this in the past, but I can't say under exactly what circumstances it occurred. Different versions of Git have had different code in them. Big, heavily-used servers like GitHub (there are others as well) may have customized hacks in their Git implementations (many of these, such as pack bitmaps, have gone into regular Git over time).

The short version is that if any deleted data items are sensitive (passwords and the like), you should consider them compromised. If you are just worried that there might be extra data downloaded, well, you have no real control over that: hope that the host with R either doesn't short-cut pack-building this way, or has done the appropriate garbage collection already.


1In other words, file system backups, snapshots, or, e.g., NSA-grade data recovery tools might still be able to find them, but Git won't.

Upvotes: 3

Mark Adelsberger
Mark Adelsberger

Reputation: 45719

Deleting the remote branch does not delete the commits; they might eventually be cleaned up by a git gc run. If the remote is on a local shared drive or something of the sort, then it would be up to you if/when to run gc on it; and even then, a gc run can only remove a commit if nothing (including reflogs) can reach that commit. If it's contained in a server (something like TFS) or hosted (by github or similar), then it depends on the options exposed by the server or hosting service, whether/how you have the option to run gc.

So the commits may stick around for a while, and may be accessible to someone who knows what they're looking for.

But you asked specifically about someone cloning/pulling. The pack sent on pull or clone does not include anything that isn't reachable from some ref. In this case even if there's a reflog entry it's not relevant. If you delete the remote branch, and none of the commits in question have been tagged, merged, or had another branch created on them, then the commits should not be included in pull or close operations.

Upvotes: 3

Related Questions