Reputation: 51
My repo is forked from an open sourced project, so I don't want to modify the commits before the ForkPoint tag. I've tried the BFG Repo Cleaner but it doesn't allow me to specify a range.
I want to
ForkPoint..HEAD^
How to remove unused objects from a git repository? says it should be something like this
BADFILES=$(find . -type f -size +10M -exec echo -n "'{}' " \;)
git filter-branch --index-filter \
"git rm -rf --cached --ignore-unmatch $BADFILES" ForkPoint..HEAD^
but wouldn't BADFILES
only contain the files that exist in HEAD
?
For instance, if I've mistakenly committed a HUGE_FILE
then later made another commit that removes that file, the BADFILES
search wouldn't find the HUGE_FILE
since find
doesn't see it in the current working tree.
Edit1: Now I'm considering using BFG on a clone, then moving my fork onto the original ForkPoint. Would this be the right command, given fatRepo
and slimRepo
?
mkdir merger ; cd merger ; git init
git remote add fat ../fatRepo
git remote add slim ../slimRepo
git fetch --all
git checkout fat/ForkPoint
git cherry-pick slim/ForkPoint..slim/branchHead
Edit2: Cherry-picking didn't work because cherry-picking can't handle merges in slimRepo. Can I somehow crush down the history of slimRepo, and simply merge onto fatRepo/ForkPoint?
git <turn into a single commit> slim/rootNode..slim/ForkPoint
git checkout fat/ForkPoint
git merge slim/branchHead
Upvotes: 2
Views: 907
Reputation: 488463
Yes, you are correct.
If you can identify the files in advance, just list them manually.
If you need to pick large files from each commit, you can:
$GIT_COMMIT
, or(or of course anything else you can come up with).
The index-filter is much faster as it allows you (and git) to skip the messy business of turning each to-be-filtered commit into a work-tree, and vice versa. If there are few commits to copy, however, you will be putting time and mental effort into something with an overall small return. If you wish to go this way, note that you need sufficient quoting to extract $GIT_COMMIT
from the variables available at the time the eval
occurs (see, e.g., the script trick below, since it's put into the environment).
The tree-filter is easy to use: in this case, git extracts the original commit into a clean, empty sub-directory (by default, a sub-directory created within the .git
directory containing the repository, but see the -d
argument) and runs your filter (in that sub-directory). Whatever files remain afterward are put into a new commit with the other filters, if any, also applied (in the order given in the documentation). So your tree-filter could simply be:
find . -type f -size +10M -exec rm '{}' ';'
Note that the string is passed to eval
so it is necessary to use several levels of quoting. Alternatively, you can simply run it by a full path name: put your script in a file such as /tmp/cleanup.sh
, make it executable, and use:
git filter-branch --tree-filter /tmp/cleanup.sh ForkPoint..HEAD^
The tree-filter will be slow, but you might not care that much, especially if your range contains only a handful of commits.
Edit: to find large files in a particular commit (or other tree) by looking at the tree stored in that commit—this is what you would need in an index filter—you can use use this script-ette (lightly tested):
git ls-tree -lr $ref |
while read mode type hash size path; do
[ $size -gt $limit ] && echo $size $path
done
Choose suitable values for $ref
($GIT_COMMIT
in an index filter) and $limit
. Change the echo
command to git rm --cached -- $path
to remove them in the filter. (You won't need --ignore-unmatch
since the found paths are found by looking at the tree for that commit.)
You can see what this would do by using git rev-list
to prepare a set of refs first:
git rev-list ForkPoint..HEAD^ | /tmp/script
where /tmp/script is:
check_tree() {
git ls-tree -lr $1 |
while read mode type hash size path; do
[ $size -gt $limit ] && echo $size $path
done
}
limit=1000000 # or whatever number
while read rev; do
check_tree $rev
done
Then use a slightly modified script (as noted above) as the actual index filter, once you have found the desired size-limit value.
Upvotes: 1