Remove file from git local repo leaving files in place and protect files from future deletion

Question

This is a follow up question to Remove a file from a Git repository without deleting it from the local filesystem

Step 1

mkdir ~/repo
cd ~/repo
git init .
echo important >file
git add .
git commit -m "initial commit"

cd ~
git clone repo repo2
cd ~/repo2
echo really important>file

We now have a git repo with a single file in it and clone of that repo.

Now let's say we don't want that file1 to be tracked by git.

Step 2

$ cd ~/repo
$ git rm --cached file
$ git commit -m "remove file from repo"
$ cat file
important
$ # ↑ yes, file is still here, good

OK, good so far; the file has been deleted in the repo.

In the other repo we may have altered the file:

Step 3

$ cd ../repo2
$ echo "really important">file
$ git pull
$ cat file
✖ cat: file: No such file or directory

Doh! Our file is now deleted.

The reason for this is that git records the deletion in a commit, and when you pull you merge in those changes, meaning it does the rm file, losing your file.

Is there anyway to tell git to stop tracking a file (or dir), and not replay the deletion? I'm looking for a way to do this without requiring a forced update (i.e. without using filter to edit history)?

Another way to phrase the question might be: after Step 2, instead of Step 3, we have this:

$ cd ../repo2/
$ echo "really important" >file
$ echo file >.gitignore
$ git status -s
 M file
?? .gitignore

So at this point, file is in gitignore, but git isn't ignoring it. Is there a way here to do something to make it drop it?

I realise that a valid answer might be "No, that's nonsense and would completely break the whole way git works", but I've got stung a few times with this situation so thought I'd ask.

torek · Accepted Answer

Is there anyway to tell git to stop tracking a file (or dir), and not replay the deletion?

No. But there is also no way to get it to "replay the deletion". Git never really replays anything; the notion of "replaying" is shorthand for what really happens.

... So at this point, file is in gitignore, but git isn't ignoring it. Is there a way here to do something to make it drop it?

No.

The real picture—all other attempts to describe it are just playing with shadows in Plato's Cave—is this:

Git stores commits. Commits store files (plus metadata), but the unit-of-storage is the commit. (Note: commits do not store directories / folders, just files. The files' names may have embedded slashes: foo/bar/baz.ext is just a file with a long name.)
The "true name" of a commit is its hash ID.
No part of any commit can ever be changed. You can create new ones, but you cannot change existing ones. They are completely, totally read-only. The files inside them are in a special, compressed, read-only, Git-only format. I like to call them freeze-dried as they aren't usable until you rehydrate them.
Therefore, you cannot and do not work on commits. So Git provides an area in which you can do work, which Git calls your work tree or working tree or working directory or any number of similar names. I tend to use work-tree, hyphenated.

The files in your work-tree are read/write. They're just ordinary everyday files. If your file system demands that Git make a directory / folder to hold a file named dir/file.ext, Git will make the directory at that time.

Git fills in the work-tree based on some commit you check out, but there is an extra step. The commit you do check out, whatever it is—identified by hash ID, with the hash ID often found by branch name—supplies the files. Whatever that commit is, that is now your current commit. The name HEAD refers to this current commit (and also to the branch name, if you selected it by branch name).
Sitting between your current commit—which, remember, is read-only—and your work-tree, Git interposes a data structure that Git calls the index, or the staging area, or—rarely these days—the cache. These are three names for the same entity. Though the name index is kind of meaningless, that's the one I will use here. (The other two names are more suggestive as to how you use it—to "stage" files—and how Git uses it, to speed things up via caching.)

Almost all of your woes here have to do with the index, so let's look at it more closely. While it has an even bigger role during conflicted merges, most of the time, what to know about the index—the way it's used—is that it represents the files you intend to store in your next commit.

When you first check out some existing commit, in a new cloned repository with a truly-empty index and no files at all in its work-tree, Git first copies all the files in the selected commit to the index.¹ The index copies (references; see footnote 1) are in the special freeze-dried format. Unlike the committed copies, though, you can overwrite them with new files.

Having filled in the index, Git now populates your work-tree by rehydrating all the index copies and putting those now-everyday-format files into the work-tree (creating folders if and when needed).

Hence, at this point, every file has three copies: the read-only HEAD copy; the overwrite-able, frozen-format index copy; and the regular-format ordinary file copy that you can actually use. Such a file is tracked, because "tracked" means "exists in the index".

You can add a new file, that never existed before, to the index, by copying it from a work-tree copy:

echo new file > newfile.ext
git add newfile.ext

You can remove an existing file from both the index and the work-tree:

git rm oldfile

Or, you can remove the file from the index, leaving the copy in the work-tree:

git rm --cached oldfile

You can overwrite existing files, presumably with new and improved versions:

# fix up existingfile.py
git add existingfile.py

Every change you make in the index represents a proposed next commit. We've added one new file, removed one existing file, and updated one existing file, at this point. If we now run:

git commit

Git will build our new commit from whatever is in the index right now. That new commit has all the files from the previous commit except:

it no longer has oldfile;
it now has newfile.ext; and
it has a different version of existingfile.py.

If you go to view this new commit, Git shows it to you by, in a temporary area, extracting both the old and new snapshots and then comparing them.² For all the files that are the same, Git says nothing. For files that are gone (oldfile), Git claims the file is "deleted". For the new files, Git claims they are added, and for files that are in both, but different, Git produces a set of instructions that will let you change the old content to the new content.

The new commit becomes the tip commit of the branch (Git stores the new hash ID in the branch name) and HEAD now refers to the new commit we just made.

¹Technically, the index holds a reference to the file, rather than a copy of it. However, unless you start using git update-index directly, or inspect how much space gets used, you can't really tell the difference here. It's easier to think of the index as if it held a copy.

²Due to the internal storage format, Git can immediately skip the extraction of all unchanged files. The rest happens in memory, rather than on-disk.

So far, this probably all makes sense, so what goes wrong?

Given all of the above, suppose we have used git rm --cached to remove oldfile, so that we still have it in our work-tree.

Now, for whatever reason—curiosity, need to fix an old bug, whatever it is—we direct Git to git checkout an older commit that does have oldfile.

Git will dutifully fill in our index from every file in that commit, including oldfile. So oldfile is now tracked. Git will extract the old contents of oldfile into our work-tree.³ We do whatever it is we need to do with the old commit, and then return to our latest commit (the one now at the tip of the branch), in which oldfile isn't in the commit.

Git sees that oldfile is tracked (is in the index) but does not exist in the desired commit, so it removes oldfile from the index and from the work-tree as it checks out the desired commit + branch. The file is now no longer visible in the work-tree. Note that there is a copy of the file in Git, in the old commit. You can extract its content from that old commit, using, e.g., git show commit:path.

³Git will, however, first check whether this will overwrite our current copy of oldfile. If so, Git will refuse the checkout, saying that our file will be overwritten, unless you use --force or various other options. Note: adding the file to .gitignore affects this negatively: sometimes Git will feel free to clobber the file!

`.gitignore`

Listing a file in .gitignore has no effect if the file is tracked.

If the file is untracked—is not in the index—Git normally whines about it, telling you that you have an untracked file. You can make Git shut up about this by listing the file in .gitignore. Git will also not automatically add that file to the index at this point: git add * or git add . will skip the ignored (and still untracked) file.

By definition, an untracked file exists only in the work-tree, not in the index, so the next commit you make will lack the file.

--assume-unchanged and --skip-worktree

If a file exists in the index, it is tracked. But you can get a similar effect as .gitignore by setting the assume-unchanged or skip-worktree flag on the index entry. This tells Git: when you're about to compare the index copy of the file to the work-tree copy of the file, mostly you should just assume they match.

This means that if you modify the work-tree copy of the file now, git status won't mention it. If you run git add ., git add will think it's not modified, and not overwrite the index copy. (Overwriting the index copy requires compressing / freeze-drying the file's content, which is relatively slow, compared to just noticing that the file is not changed and not overwriting.)

So, by having Git lie to itself in strategic places, you can carry the file around with local changes in the work-tree. But because there's an index copy, any time Git sees the need to change or remove the index copy for some other reason—such as switching to a commit that has a different version of the file, or that lacks the file—Git will need to update both the index and work-tree copies of the file.

So what is the solution?

The real answer to this question is: there is no single solution. Your best bet is usually never to have committed the file at all. But if you did commit the file—if it exists in some set of existing commits—you must simply deal with that.

You can copy some or all of your commits to new-and-improved commits that do not have the file at all, then stop using the old commits and use only the new commits. This is what git filter-branch, The BFG, and the new git filter-repo are about.

Or you can just deal with it: remember that checking out a commit that has the file, copies the file into the index and the work-tree. Moving from such a commit to one that does not have the file, removes the file from the index and work-tree. If you have precious data in the file that should be neither committed nor removed—such as configuration information—rename the file to a new name that you never commit, so that the new name never winds up in the repository.

Remove file from git local repo leaving files in place and protect files from future deletion

Answers (1)

So far, this probably all makes sense, so what goes wrong?

`.gitignore`

--assume-unchanged and --skip-worktree

So what is the solution?

Related Questions