Reputation: 10073
I want to reduce the size of my repository by removing all commits older than the 5th commit ago.
This question is different than other questions because I am looking for answers only for that very specific way to reduce the size.
I have read the other similar questions and the answers are confusing because there are so many options. I am hoping by making my request very specific that I can get very specific answer that will be easy execute.
I am hoping that this can be a specific enumerated list of instructions starting with a git clone myrepo
and ending with a git push -force myrepo
or something like that.
Upvotes: -2
Views: 168
Reputation: 60555
I'll skip over the full hour-long "this is a very bad idea" exhortatory sermon except for its last two sentences: Don't say you weren't warned. This is a very bad idea.
removing all commits older than the 5th commit ago
Assuming by "ago" you're referring to commit date, here's a starter kit. The monkey on my back made me fix it up to run pretty well, I tested running the commands this prints on the Git history, it takes like two seconds:
since=$(git log -1 --skip 4 --branches --pretty=%cI);
git rev-list --parents --branches --since=$since --reverse \
| awk '{ ++keep[$1]
for (f=NF;f>1;--f) if (!keep[$f]) {++drop[$f];$f=""}
print "git replace --graft", $0
}
END { print "GIT_FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch -f -- --branches \\"
for (k in keep) if (keep[k]) print "--ancestry-path="k" \\"
for (d in drop) if (drop[d]) print "^"d" \\"
print ";"
}'
and that will print the commands to replace your existing history with one lacking any commits made before the fifth commit ago -- except any branches that would be entirely deleted haven't been, yet. When you run them they will just temporarily rewire ancestry of existing commits.
If you don't like what you see with git log --oneline --branches --since=$since
after that, you can git fetch -u . +refs/original/*:*; git replace -d $(git replace)
to undo the truncations, no harm done.
If instead you then follow executing the rewrites it prints with the commands below, you'll finish baking in the rewrites, delete any completely-outdated branches and compact the repository:
# this batch makes backout in this clone impossible:
git replace -d $(git replace)
GIT_FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch -f --setup exit
`git config core.bare` || git checkout --detach
git log --no-walk --branches --before=$since --pretty='git branch -D %S' | sh
`git config core.bare` || git checkout -
git reflog expire --all --expire-unreachable=now
git for-each-ref refs/remotes --format='delete %(refname)' | git update-ref --stdin
git repack -ad
# this puts the upstream on the path to unrecoverability too:
git push -f --branches --prune
and your upstream repo will lose the old history too, once it gets garbage-collected after the pruning interval expires.
This does not attempt to account for tags, it ignores their existence. If you're using tags on a five-commit history, I'll leave extending this to rewrite tags that should also be re-hung and delete tags that shouldn't to you.
Upvotes: 3
Reputation: 535989
The concept "by removing all commits older than the 5th commit ago" is not really coherent. However, let us presume the simplest possible case, that what you mean is: you have a main branch main
, and you want the repo to consist only of commits main~5
thru main
, and none of these is a merge commit.
That in itself is not really coherent either, because by removing any earlier commits, we will be changing history — and a commit's history can never be changed. So what we really want are new commits that look like commits main~5
thru main
, and they should be, in effect, the only commits in the history of main
.
Very well. You will be working in the local repo (which we will presume matches the remote repo, because, as you have said, you cloned it, or because you fetched it and updated your local main
). As an illustration I will presume the history initially looks something like this:
* 1cebb20 (HEAD -> main) h
* 44285d5 g
* ab95e96 f
* 7896607 e
* 8d0a11a d
* 214a6c5 c
* dd7cb4c b
* 769dfff a
There may be commits before a
but never mind that. The point then is that we want, in effect, the first commit with any content in the history to be just like c
(because it is 5 before h
; it is main~5
).
That part of the task is the hardest. We need a commit that is not c
(because we are going to change parentage) but whose contents look like those of c
. This commit will need a parent that is itself parentless — what Git calls an "orphan". To effect this, you would first say:
% git switch --orphan temp
% git commit --allow-empty -mnewroot
% git cat-file commit main~5
tree 04a59185a0c5f4047e4fd3fa87b0c84e671b00ee
parent ...
author ...
committer ...
Okay, we have made an empty parentless commit, pointed to by temp
. And Git has told us how to get at the content of the commit 5 before main
. We want to take that content and pour it into a new commit whose parent is the temp
branch we have just created. We do this by using the tree
SHA that we were just given, like this:
% git commit-tree -p temp -m 'c' 04a59185a0c5f4047e4fd3fa87b0c84e671b00ee
b1fa80953a368fa6cc7f58b2018be19d2adf2b69
(Naturally, your numbers here will be different.)
Okay, so we have made a new commit that looks like c
and has the empty parentless (orphan) temp
commit as its parent. Git has also told us the SHA of this new commit — the new c
. The rest is easy: we simply rebase the remainder of main
onto that commit:
% git rebase --onto b1fa80953a main~5 main
Done! The history now looks like this:
* 3581893 (HEAD -> main) h
* 95227d0 g
* df95fd3 f
* f2f1edf e
* a910be2 d
* b1fa809 c
* 8f41473 (temp) newroot
We can now delete temp
, as its job is done.
% git branch -D temp
And then of course, as you rightly suggest, you would need to push with force in order to update an existing remote repo; but it would be much simpler and more efficient at this point just to delete the remote repo and make a new one (and GitHub will then give you instructions for associating your local repo with this new repo and pushing main
).
Upvotes: 4
Reputation: 94963
Another approach: cherry-picking into an orphan branch. 1st, create an orphan branch:
git switch --orphan=new-master
Cleanup the directory:
git clean -fdx
Copy and old commit completely to create a basis for further cherry-picking:
git restore -s master~4 .
git add -A .
git commit -C master~4
Now cherry-pick 4 commits back to the tip of master
:
git cherry-pick master~4..master
Delete branch master
and rename the current branch to master
:
git branch -D master
git branch -m master
PS. Preserve the full repo backup for some time.
Upvotes: 0
Reputation: 198199
Disable your internet connection and clone the repository you want to vacuum locally without checking out a branch.
Enter the new work tree and create a new, empty branch.
Now point git to the previous work tree and archive the previous-last revision you would like to keep and tar-pipe it into your new work tree and commit it as the base.
Then cherry pick via the remote branch references the five commits you want to have.
Then remove the remote.
Rename the branch.
Run the garbage collection.
Now compare how much bytes this has saved. This may greatly vary, but at least you can directly compare between the local clones.
If it's good, delete the old location so that you can move the new clone into the old place.
Add the online remote you want to push to and go online again.
Now delete all references on the remote.
When the remote repository is completely empty, push the new history.
Upvotes: -1
Reputation: 94963
Make a backup and try this shell script on a freshly cloned repository:
FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch \
--prune-empty --setup "COMMIT_COUNT=`git rev-list --count HEAD`" \
--index-filter '
if [ "$COMMIT_COUNT" -gt 5 ]; then
git rm -r --quiet .
COMMIT_COUNT=`expr $COMMIT_COUNT - 1`
fi
' HEAD
Upvotes: 0