Reputation: 1443
I'm wondering if a remote git
repo does (or should do) automatic delete of unreferenced file objects (and also trees) once it received a push
from local, after rebasing local and skipping some commits that introduced those files and also these commits deleted those files. Since these skipped commits are no longer in the history chain of commits it's logical that remote delete these objects as they are now not part of any commit in the history. This graph may explain it:
This is the history before rebase --onto
* b5b7c142 after-deleting offending-file * db759b06 deleted offending-file * 59a9440a added offending-file * 933729b1 before-adding-offending-file
which was pushed to the remote before I regret it. But here comes the attempt to fix it...
rebase --onto 933729b1 db759b06
which effectively reconstructs commit b5b7c142 after-deleting offending-file
to have a different parent: 933729b1 before-adding-offending-file
and leaving the middle two commits simply ignored.
This is how it looks after the rebase above: (please note that first commit SHA1 changed because we changed parent)
* 17c95f49 after-deleting offending-file | * db759b06 deleted offending-file | * 59a9440a added offending-file | / * 933729b1 before-adding-offending-file
and it's looking ok for a history on local and that file object still exists in .git/objects, it's a part of some commits that are here still. Now what happens if I pushed now to the remote? Will it delete that file object in .git/objects
on github as it's now not part of any commit/tree? And if not, how can I do that?
Upvotes: 0
Views: 582
Reputation: 488233
GitHub may or may not delete the unreachable commit and file some time in the future. It's up to them.
A normal everyday Git repository—one you control, for instance—will generally drop the unreferenced commit entirely when git gc
runs. For that to happen, though, first all references have to go away. Using git rebase
leaves several references behind, on purpose:
HEAD
reflog (viewable with git reflog
).git reflog branch
).ORIG_HEAD
.The last one will be overwritten with the next operation that saves the previous HEAD
value in ORIG_HEAD
. The other two will eventually be dropped due to reflog entry expiration. Each reflog entry is timestamped, and is "live" until the current time is more than the expiration time added to the entry's timestamp. Another of git gc
's functions is to check for expired entries, which it will delete. The expiration time is under your control, and is both 30 days and 90 days by default. This part is confusing (how can it be both?) but is not really relevant to the GitHub variant because they don't use the reflogs like this: the point is that the references have to be really gone, which takes time, and this part is true for GitHub as well.
Once the references are really gone, a git gc
would discard the internal objects that hold the unwanted commit and file, provided that they're not in a kept pack. Kept packs are something you have to create on your own—Git doesn't do this itself—so if you're not doing that, you personally won't encounter this.
The main issue you'll have with GitHub is that you don't know when they will scrub their last reference, nor when they will subsequently run a git gc
that will discard the object—plus, they add special refs for pull requests, issues, and other items, which can keep objects alive indefinitely. The upshot of all of this is that you cannot predict when or even whether some file will disappear from GitHub.
Note that you can contact GitHub support and get them to do a manual scrub. Of course, by then, any number of people could have obtained this file, so if there's any sensitive data in it, consider it to be well-known to the black-hat hacker community by now.
Upvotes: 2