fiscme
fiscme

Reputation: 422

How to remove committed files no longer in working directory

I have a repo with lots of files that are no longer in the working directory- files that have been added and removed over the months/years of the repository.

I would like to make a file with a list of all these files that are stored in the commit histories but no longer required, including their locations.. i.e.

/web/scripts/index.php
/sql/tables.sql
...

Then I would like a command that runs through that file and removes the files referenced in it from the commit history completely, something like git rm --cached does but for a list of files.

Upvotes: 4

Views: 362

Answers (2)

Luke Davis
Luke Davis

Reputation: 2666

Adding onto @David's answer, if you want to be extra careful and make sure you aren't deleting any files that were subsequently added later on in the history, use the following block of commands instead of the git delete $(git log --all --pretty=format: --name-only --diff-filter=D) (consider adding this as a function in your .bashrc):

current=($(git ls-files))
tracked=($(git log --all --pretty=format: --name-only --diff-filter=D | xargs))
deleted=()
resurrected=()
for file in "${tracked[@]}"; do
if [[ " ${current[@]} " =~ " $file " ]]; then
  resurrected+=("$file")
else
  deleted+=("$file");
fi
done
echo "Deleted: ${deleted[@]}"
echo "Resurrected: ${resurrected[@]}"
git delete "${deleted[@]}"

Upvotes: 0

David Cain
David Cain

Reputation: 17343

Short answer

Alias David Underhill's script, then run (with caution):

$ git delete `git log --all --pretty=format: --name-only --diff-filter=D`

Explanation

David Underhill's command uses filter-branch to modify the history of your repository, removing all history of a given file path.

The script, in its entirety (source):

#!/bin/bash
set -o errexit

# Author: David Underhill
# Script to permanently delete files/folders from your git repository.  To use 
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2

if [ $# -eq 0 ]; then
    exit 0
fi

# make sure we're at the root of git repo
if [ ! -d .git ]; then
    echo "Error: must run this script from the root of a git repository"
    exit 1
fi

# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD

# remove the temporary history git-filter-branch otherwise leaves behind for a long time
rm -rf .git/refs/original/ && git reflog expire --all &&  git gc --aggressive --prune

Save this script to a location on your hard drive (e.g. /path/to/deletion_script.sh), and make sure it's executable (chmod +x /path/to/deletion_script.sh).

Then alias the command:

$ git config --global alias.delete '!/path/to/deletion_script.sh'

To get a sorted list of all deleted files:

$ git log --all --pretty=format: --name-only --diff-filter=D | sort -u

Bringing it all together

With a list of deleted files, it's just a matter of hooking up git delete to process each file in the list:

$ git delete `git log --all --pretty=format: --name-only --diff-filter=D`

Testing/Example usage

  1. Make a dummy repository with additions, renamings, and deletions:

    mkdir test_repo
    cd test_repo/
    git init
    echo "Dummy content" >> stays.txt
    git add stays.txt && git commit -m "First file, will stay"
    echo "Rename content" >> will_rename.txt
    git add will_rename.txt && git commit -m "Going to rename"
    echo "Delete this file" >> will_delete.txt
    git add will_delete.txt && git commit -m "Delete this file"
    git mv will_rename.txt renamed.txt && git commit -m "File renamed"
    git rm will_delete.txt && git commit -m "File deleted"
    
  2. Inspect the history:

    $ git whatchanged --oneline
    d768c58 File deleted
    :100644 000000 7a4187c... 0000000... D  will_delete.txt
    96aadf0 File renamed
    :000000 100644 0000000... 94a12c7... A  renamed.txt
    :100644 000000 94a12c7... 0000000... D  will_rename.txt
    3ba05fa Delete this file
    :000000 100644 0000000... 7a4187c... A  will_delete.txt
    c88850a Going to rename
    :000000 100644 0000000... 94a12c7... A  will_rename.txt
    6db6015 First file, will stay
    :000000 100644 0000000... f3ae800... A  stays.txt
    
  3. Delete old files:

    $ git delete `git log --all --pretty=format: --name-only --diff-filter=D`
    Rewrite 8c2009db5ac05b27cd065482da94dec717f5ef4a (8/9)rm 'will_delete.txt'
    Rewrite e1348d588597f2f6dd63cade081e0fbdf8692c74 (9/9)
    Ref 'refs/heads/master' was rewritten
    Counting objects: 27, done.
    Delta compression using up to 4 threads.
    Compressing objects: 100% (22/22), done.
    Writing objects: 100% (27/27), done.
    Total 27 (delta 12), reused 10 (delta 0)
    
  4. Inspect the repository now. Notice that the deletions have been removed from the history, and renamings appear as if the file was added initially that way.

    c800020 File renamed
    :000000 100644 0000000... 94a12c7... A  renamed.txt
    0a729d7 First file, will stay
    :000000 100644 0000000... f3ae800... A  stays.txt
    

Upvotes: 3

Related Questions