Reputation: 87
If I merge branch A into branch B and then delete A, which branch do commits from branch A (now deleted) belong to? when I get the link of these commits, I found "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."
I tried all answers of this question and the didn't solve mu question Listing and deleting Git commits that are under no branch (dangling?)
What is the solution?
Upvotes: 4
Views: 5525
Reputation: 21
I just encountered the identical issue on GitHub. My issue was that I discovered some remnants in my search after using git filter-branch
to delete sensitive data from GitHub. But after I contacted GitHub support, the issue was resolved in 5 minutes.
Upvotes: 1
Reputation: 489748
You can't "delete" this commit. You don't have this commit in the first place, and even if you did, you still wouldn't really be able to delete it.
If I merge branch A into branch B and then delete A, which branch do commits from branch A (now deleted) belong to?
The answer you might want here—and it's not wrong, but it's not right either—is "branch B". Unfortunately, there's a fundamental error in this question. I believe this error itself comes from GitHub's rather misleading claim about a commit not "belong[ing] to any branch on this repository":
⚠️ This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The mistake in the question itself—and the reason the text above is misleading—is that commits do not owe their existence to the presence of branch names, in Git. In Git, you can have as many commits as you like, and no branch names at all. Commits never "belong to" any branch in the first place.
Instead, a key notion we use with Git is that of reachability. If some commit Ci is reachable from some other commit Cj , in Git repository R, this means that Ci is an ancestor of Cj (or equivalently, Ci ≺ Ci, where "≺"—a sort of bendy less-than sign—is read as "precedes"): this defines a partial order on the commit graph, which is a Directed Acyclic Graph, or DAG.1
We then define branch—or at least branch name—in Git as a reference (or ref) whose name begins with refs/heads/
and whose hash ID is constrained to be that of a commit, with ref itself defined as a name containing a hash ID.2 Hence a name like refs/heads/branch
is a branch name, and the hash ID stored in this branch name must be that of some commit.
A commit reaches all its ancestors. Each commit stores a list—usually just one entry long—of previous commit hash IDs. These form commits into chains, with backwards-pointing arrows. Simple cases have just one backwards arrow coming out of each commit, pointing to its predecessor:
A <-B <-C ... <-F <-G <-H
Here, in our simple repository R, we have exactly eight commits. Instead of using Git's actual commit hash IDs, we've given them single uppercase letters. (This scheme is impractical in a real repository: what would we do if there were more than 26 commits? But it's useful for thinking about the issues here.) The last commit we made, H
, stores inside itself the hash ID of the second-to-last commit G
. We say that H
points to G
. G
stores F
's hash ID, so we say that G
points to F
. This continues, backwards, down the entire chain of commits until we hit commit A
. Because it's the very first commit, it can't point backwards, and it doesn't: its list of parent hash IDs is empty.
1This particular definition is slightly backwards, because Git itself works backwards. In a normal DAG, reachability would imply successorship, rather than predecessorship. But in Git, all the arrows point backwards, instead of forwards.
2Most refs are spelled refs/*
, but there exist pseudo-refs, such as HEAD
and CHERRY_PICK_HEAD
that do not. Pseudo-refs are special cases that make things troublesome for the folks working on putting in a proper ref database for Git. Note that pseudo-refs are per-work-tree, but some other refs, such as the bisection refs, are also per-work-tree.
We start with our simple eight-commit repository ending with:
...--G--H <-- main (HEAD)
We've added the branch name main
and stored in main
the real hash ID of commit H
. So we say that main
points to H
, the same way that H
points to G
. (For text / ASCII-art purposes on Stack Overflow I've failed to draw the arrow from H
to G
as an arrow: we just have to remember that commits only link backwards. There's no link from G
to H
, only vice versa.)
This setup means that the name main
allows us to reach any of the eight commits in the repository. Let's now add two more branch names, br1
and br2
, both of which point to commit H
:
...--G--H <-- br1, br2, main (HEAD)
All three names point to commit H
. So all eight commits are reachable from all names. This means that all commits are on all branches.
The HEAD
attached to main
here means that the branch name we're using is HEAD
, and the current commit is therefore commit H
. Let's run git checkout
or git switch
now, to change which name HEAD
is attached to:
git switch br1
This results in:
...--G--H <-- br1 (HEAD), br2, main
The only thing that changed at this point is that HEAD
is now attached to br1
. All eight commits are still there; we're still using commit H
; but now we're using H
via the name br1
.
Now we make a new commit, in the usual way. This new commit gets a new, unique hash ID, but we just call it "commit I
" to keep our sanity. To draw it in, we need to draw an arrow from I
pointing back to H
, and make the name br1
point to I
, because that's how Git actually handles this internally:
I <-- br1 (HEAD)
/
...--G--H <-- br2, main
We're now using commit I
, through name br1
.
If we add another new commit J
, we get:
I--J <-- br1 (HEAD)
/
...--G--H <-- br2, main
We're now using commit J
through name br1
. Now we switch to br2
:
git switch br2
I--J <-- br1
/
...--G--H <-- br2 (HEAD), main
We're now using commit H
again, through name br2
. If we make two more commits, we get:
I--J <-- br1
/
...--G--H <-- main
\
K--L <-- br2 (HEAD)
Et voila, we have "branches"! Commits I-J
can be said to "belong to" branch br1
and commit K-L
can be said to "belong to" branch br2
, but what about the commits up through H
? Some would say these "belong to" main
, but Git makes no such distinction: they're "on" all three branches. When we first made the two br*
branch names, all the commits were "on" all three branches, and those commits still are on all three branches. It's just that new commits I-J
are only on br1
, and new commits K-L
are only on br2
, at the moment.
When we use git merge
, we're not really merging branches. We're really merging commits. Let's do that now:
git switch br1
git merge br2
The git switch
makes commit J
the current commit, by attaching HEAD
to br1
. The git merge
has Git locate not one, not two, but three commits:
HEAD
commit, J
;br2
points to L
; andThe merge base is defined through the Lowest Common Ancestor algorithm, whose inputs are commits J
and L
, and this algorithm coughs up the hash ID of commit H
.
The merge itself works by comparing the stored snapshots in H
, J
, and L
. This allows Git to figure out "what we did" on the H-I-J
chain, and "what they did" on the H-K-L
chain. (Note that commits I
and K
are used only for their linkage here, not for their snapshots: both link back to commit H
, which caused commit H
to be the merge base.)
If all goes well, Git makes the new merge commit on its own. This new merge commit M
has not one but two parents—two backwards-pointing arrows—linking to *both commits J
and L
, like this:
I--J
/ \
...--G--H M
\ /
K--L
I've temporarily taken all the branch names away from the drawing, because we don't need them: commits exist independently of any branch names. But making a new commit in Git always does the same thing:
when we made commit I
, Git wrote the new commit's hash ID into the then-current branch name br1
;
when we made commit J
, Git wrote the new commit's hash ID into the then-current branch name br1
; and
when we made commits K
and L
, Git wrote the new commit's hash ID into the then-current branch name br2
;
so now that we made M
, Git writes M
's hash ID into the now-current branch name br1
:
I--J
/ \
...--G--H M <-- br1 (HEAD)
\ /
K--L
Names main
and br2
still exist, and still point to H
and L
. There's no room to draw in main
, in this ASCII art, and there's no need to draw in br2
right now. We can instead ask: Which commits are reachable from the name br1
? The answer is: All of them!
Commits K-L
were only on br2
before, but now, because of merge commit M
, commits K-L
are on two branches. So that gets us an answer to your original question, as long as we rephrase it slightly: after a true merge, deleting a branch name is "safe" because the commits are still findable via the merge commit. They're now "on" both branches, and taking away one name—the name we're not using right now, br2
in this case—still leaves at least one other name that they're "on".
While the git merge
command sometimes makes merge commits M
:
I--J <-- br1
/
...--G--H
\
K--L <-- br2
we can come up with other situations where it doesn't:
...--P--Q <-- br3 (HEAD)
\
R <-- br4
Here, git merge br4
will do a fast forward operation instead of a merge, producing:
...--P--Q--R <-- br3 (HEAD), br4
In the case of a fast-forward, deleting br4
is still safe: the commit that used to be only "on" br4
, commit R
, is now "on" br3
too.
But we can also run git merge --squash
, and that particular option directs git merge
to make a non-merge "squash" commit:
I--J <-- br1 (HEAD)
/
...--G--H
\
K--L <-- br2
[we now run:
git merge --squash br2
and a second Git command that we're forced to run, to get:]
I--J--S <-- br1 (HEAD)
/
...--G--H
\
K--L <-- br2
New commit S
here, after the git merge --squash
, has the same snapshot we'd get if we had git merge
make a true merge. That is, Git still went through all the normal "find the merge base, run two diffs, combine work" steps that it would do for a true merge. But then git merge
stops and makes us run git commit
,3 and when we do, git commit
makes an ordinary non-merge commit, which I drew above as S
.
3There's no good reason for this. If we want this action, we can run git merge --squash --no-commit
. This combination is allowed! It does the exact same thing as git merge --squash
today. But in the distant past, the --squash
option was handled as a special case of --no-commit
, so that it did both things, and that means that it now has to keep doing both things in the name of backwards compatibility.
In general, in Git, we—and even Git itself—find commits using names. They do not have to be branch names, but they very typically are branch names, or in clones, remote-tracking names (origin/*
for instance). Regardless of the kind of name—branch name, tag name, remote-tracking name, internal bisection reference, or whatever it might be—the name holds one hash ID. If that's a commit hash ID, it suffices to find all predecessor commits, through the graph reachability algorithms.
But sometimes we might have commits that can only be found by one ref, such as the one branch name br2
:
I--J--S <-- br1 (HEAD)
/
...--G--H
\
K--L <-- br2
If we delete this one ref br2
, how do we find commits K-L
?
One answer is: we don't. (Another—but only temporary—answer is to use Git's reflogs, which semi-secretly hold on to commit hash IDs for a while. Eventually the reflog entries expire, though, and then we're back to the "we don't" answer.)
If we, and Git, cannot find a commit, that commit becomes eligible for "garbage collection" under git gc
.4 Git will run git gc
for you, automatically, at irregular and Git-determined times. This git gc
will—slowly and painfully, by crawling through the entire repository R—find any commits and other Git objects that are unreachable and, if several other conditions are met,5 actually remove the objects from the repository objects database.
This gc system is quite clever. It allows Git programs to generate internal objects freely whenever they're useful, then simply abandon them when they have no use any more. The garbage collector / janitorial service will come along later and clean up.
4git gc
is part of general Git maintenance-and-housekeeping, and there is ongoing work now on a git maintenance
command that will handle this in a more generalized, predictable, and usable fashion for server setups. It's possible that git maintenance
may eventually be useful to ordinary users as well as Git administrators, but there is much more to be done here first.
5The most important one is that the object itself be sufficiently old. Since git gc
can be running "in the background" at any time, it's important that it not delete an object that exists because some command—say, git commit
—has just created the object, just now, but not yet gotten around to hooking it up to be visible. If git gc
garbage-collected a fresh commit just before git commit
could write its hash ID into a branch name, that would be bad. So everything gets, by default, a two-week window to finish up whatever it's doing. Two weeks is probably enough for git commit
to finish writing out a new commit. 😀
(Kidding aside, Git's operation is so much faster than the version control systems we used in the old days. I'd best stop here, lest this turn into the Monty Python Four Yorkshiremen sketch.)
When GitHub say:
⚠️ This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
what do they mean?
The direct meaning is this commit is not reachable from a branch name in this repository. Both the phrase I just quoted and my more-precise replacement phrase have two this
adjectives, both functioning as determiners: a specific commit—presumably one you have displayed in a browser—and a specific repository by which their Git found the commit.
We just said that we usually find a commit using a name. But in fact, we find the underlying commit object, in the repository's object database, using its hash ID. The hash ID is the "true name" of the commit. What we found using a name was the hash ID, not the commit object itself. If we have the hash ID in hand, that's all we need—and when we look at a GitHub repository commit using a browser, we supply the commit hash ID. For instance, the URL https://github.com/git/git/commit/5a73c6bdc717127c2da99f57bc630c4efd8aed02 ends with 5a73c6bdc7...
. That's a commit hash ID. So GitHub can access the commit without using a branch name.
Now, this particular commit—5a73c6bdc7...
—is the most recent master
commit, at the time I write this, so if GitHub look at the branch names in this repository, they immediately see that 5a73c6bdc7...
is the tip commit of master
. If, by the time you read this, the GitHub refs/heads/master
name locates some other commit, it's easy for the GitHub software to see if 5a73c6bdc7...
is an ancestor of whatever the tip commit of master
is then, and if so, 5a73c6bdc7...
is still reachable from master
, and hence still "on" branch master
.
If we pick some other commit in some other repository, though, perhaps that commit isn't reachable from any branch name. If so, that satisfies the first part of the clause in the quote:
⚠️ This commit does not belong to any branch on this repository
and we could stop there, or speculate that perhaps git gc
will eventually remove this commit. (A git gc
won't remove the commit if it's findable by some other name, such as a tag name. You can have commits that can be found only via the tag name, not any branch name. Whether GitHub will produce a warning like this for such commits is up to GitHub.)
But they go on to add this:
and may belong to a fork outside of the repository.
This is GitHub-specific. Forks are not part of Git: they're a GitHub add-on. (This particular add-on is found on other hosting sites as well, but GitHub were there first, as far as I know. Bitbucket and GitLab appear to modeled their forks on GitHub's.)
A fork, on GitHub, is a server-side clone with added features. These added features include the ability to raise Pull Requests (which are another add-on feature from GitHub). To make these Pull Requests work, GitHub internally make use of some tricks that Git has implemented for decades (at least since Git v1.0.0 in 2005-ish). One of these tricks is that Git can look in other repositories' object databases to retrieve Git objects. This means that if you have some repository Ryou on GitHub, someone else can have a different repository Rse (se
stands for Someone Else) that they forked from your Ryou. They can make commits of their own and send them to Rse ... and then, under whatever conditions might apply later, you can use a URL that embeds their commit hash ID under your repository's name and, due to this sort of alternates trick, see their commit, even though it's in their fork.7
The upshot of all of this is that you can view a commit that's in their repository, that they've raised as a Pull Request to you, as if it were in your repository. When you do this, you will definitely trigger the same "does not belong to any branch on this repository" condition. That will produce the warning you see here:
⚠️ This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
In this particular case, the commit truly is not in your repository Ryou over on GitHub. So there's no way to delete it from Ryou. It isn't in Ryou, it's in Rse. You can just see it from Ryou.
You can't tell, from the warning, which of underlying condition triggered the warning. All you know is that the commit you are viewing now is not reachable from any of the branch names in Ryou. That could be because it is reachable, but not from a branch name; it could be because it isn't reachable, and is waiting to be GC-ed; or it could be because it's in someone else's repository.
In all three cases, you can't delete the commit itself directly. In one case, git gc
might delete it on its own, but you can't make GitHub run git gc
.8 In one case—if you have a tag for the commit, for instance—there may be something you can do that would then enable git gc
to delete it on its own. And in the final case, it's not yours to delete, even if you could get git gc
to do it.
7The same sort of rules might apply to your commits as well: if they know the hash ID, they may be able to see those commits in their fork. This has obvious security implications, and I don't know what GitHub may have done about these. GitHub have a lot of very competent programmers and they may have made this all quite secure, so that you can only see their commits if they have raised a PR to you, and they can only see your commits if they're public. I am merely pointing out that at the low level, careless use of "alternates" introduces various security issues, so be careful if you use this.
8GitHub support can run git gc
for you, but you must contact them to get the process started. In that sense, you can make them run git gc
, but it's kind of indirect.
Upvotes: 11