Thijs Steel
Thijs Steel

Reputation: 1272

git filter-branch doesn't delete all files i want

I'm trying to clean up a git repository of latex code that contains the generated pdf files, because these files have caused the repo to balloon up to a size of 300mb.

Adapting a bit from the answer here How to remove file from Git history?. I tried the following command:

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch *.pdf' HEAD

This reduced the size a little, but not as much as i'd hoped. When I then try the script found in the answer to this question: How to find/identify large commits in git history?, to find which files contribute to the size, it still shows several pdf files. However, if i try the script found in this question: Which commit has this blob?, it cannot find any commit that contains the file.

I have removed all branches except the local branch. I have not pushed the changes to the remote.

Is there any reason these files would still persist in the history somewhere? What other things can I try?

Upvotes: 0

Views: 194

Answers (1)

LeGEC
LeGEC

Reputation: 51850

You may have blobs still present just because the garbage collector didn't collect them.

Try cloning your local repo, and check the size of the .git/ directory in that new clone :

git clone myrepodir myclone
cd myclone
du -sh .git

# you can then remove that clone :
cd ..
rm -rf myclone

This will be a more acurate view of how much data would be pushed or cloned.


If you are 100% positive the content after your filter-branch action is the content you want to keep, and if you don't mind loosing your reflog (no more undos, drops all your stashes) : you can run

git gc --aggressive --prune=now

See also git help gc for more details on what could be retained on your disk.

Upvotes: 1

Related Questions