Booboo
Booboo

Reputation: 44128

Every file in the remote repository is staged for deletion

My development machine has a standard git repository (i.e. with a work directory). After changes are committed, they are pushed to a remote bare repository on my production server where a post-receive hook deploys the changes to a work directory. I also push these commits to a second backup remote bare repository on yet a different server.

I went to the production server and ran a git status command with --git-dir pointing to my bare repository and --work-tree pointing to my deployment directory. It showed every file within every sub-directory of my deployment directory staged for deletion and these same sub-directories as containing untracked files. A git log command shows that all the commits are present and a git status from my local machine shows that I am "up to date with origin/master." I also verified that a git show HEAD:file1 yields the same contents as file file1 in the deployment directory.

I then went to my backup remote repository and cloned a bare repository on my local development machine from that. I then ran a git status with --git-dir pointing to my newly cloned bare repository and --work-tree pointing to my development work directory and got the same results, i.e. that all the files were staged for deletion.

To further complicate matters I have a second project also with a remote bare repository on a different production server and deployment work directory and a second backup remote bare repository. This time when I do a git status on the production server, instead of showing all the files staged for deletion, it shows that for a few files there have been changes staged for commit but since then the files have been changed but not added. The changes that have been staged for commit seem to be the HEAD~1 versions and the files that have been "changed but not added" are the HEAD versions. If I were to execute a git checkout -f command, I believe I would end up with the HEAD~1 versions in the work directory and that would not be good. Yet when I restored the backup bare repository and did a git status command, it then showed every file staged for deletion.

Can somebody explain what might have happened to have caused this? Here is one of my concerns: The post-receive hook that I use on my production server is not implemented as many are by executing a git checkout -f command but rather something more complicated. The reason for this has to do with having been forced to use a symbolic link for a directory. But I may be able to get away from using the symbolic link and would like to use the simpler git checkout -f method for my hook. But in general get checkout -f will not replace the work directory contents with the HEAD version if there are staged changes. At first I thought there was something my post-receive hook might have done that could have explained this problem. But it cannot explain seeing the same behavior using a clone of the backup repository (there are no hooks involved) used with my own local work directory.

My local version of git is git version 2.17.1.windows.2 and the remote server versions are git version 2.19.1.

Update

I have updated my local git version to git version 2.24.1.windows.2. I then created a new bare remote repository on my backup git repository and then did ...

git remote add origin3 url-to-new-repository
git push -u origin3  --all

... in order to create a new, fresh repository (my local repository had nothing staged at the time and was shown to be in sync with my production server's remote repository).

I then went to my production server and cloned a bare repository from this newly created backup repository and repeated the git status using this new cloned repository and it still showed all the files (as far as I can tell) staged for deletion. How can this be when the files exist?

Clarifications

  1. I have never run a git --git-dir=git-repository-directory --work-tree=work-directory command specifying different directories for the same git repository.
  2. I have verified that if I newly clone a repository with a git clone --bare command that the resulting repository is missing the index file. If I then rebuild the index with a git reset command, the git status produces the expected results. I have no reason not to assume that the current remote repository, which is missing the index file, was not created without it to begin with also. There is the other repository that does have an index file which has a few discrepancies that cannot be so easily explained.
  3. I do not currently deploy changes to the aforementioned repositories by using a post-receive hook that uses the git checkout -f technique (which clearly would not work in these cases until the index is repaired). The reason is that the actual work directory contains some symbolic links out of necessity which git cannot handle (and in truth I have omitted the expected errors that these symbolic links generate when testing the git status command). My actual post-receive hook is based on issuing a git diff --name-status --find-renames=100% from-revision to-revision command and analyzing the returned results. After analyzing the changes, the only git facility I use for effecting the updates is a git archive command to extract from the repository the latest revisions of files that need to be updated in the work directory. I then issue whatever file and directory rename/move/delete commands are needed and the tarcommand against the git-created archive if files have been updated. It is very complicated code but seems to work (so far). But if I can get rid of the symbolic links and and use a hook that uses git checkout -f for deployment, I would much prefer that. I have only so far taken a cursory look at https://gitolite.com/deploy.html, but nothing there has dissuaded me from that route. I have used this technique successfully for several other repositories.

Upvotes: 0

Views: 122

Answers (2)

Booboo
Booboo

Reputation: 44128

First, deploying changes by having a bare, remote repository on a production server and a post-receive hook that executes a git checkout -f master command setting the --work-tree parameter appropriately seems reasonably reliable (it has been for me where I have been able to use it).

The situations I have been describing have been ones where I have not been able to use this method because my directory layout as defined in git does not correspond to the actual deployment directory layout for two reasons:

  1. A directory was named www which in the actual production server is a symbolic link for the real directory public_html. This in itself could have been easily remedied by simply renaming my directory. But I had other problems:
  2. Two of my git repositories were for websites that ended up being parked under the public_html directory described in 1. To handle this, I had to define a directory named work_dir that contained the top-level directories as defined in my git repository for my parked website but were actually symbolic links to the real directories located wherever I chose to put them on the production server.

For these reasons I had to write my own hook that analyzed the the most recent commit and did the deployment. For the repository described in 1. above, which only had one symbolic link, I wanted to see what a git status command would generate if I set the --work-tree argument to point to the actual deployment work directory (realizing I would get erroneous results because of the www symbolic link, which I would ignore). But I discovered that all files were being shown as staged for deletion.

That's when I discovered that the $GIT_DIR/index file was missing, which explained the above results. And this is what I learned:

When one clones a bare repository there is never an index file; when you do your first push to this repository is when the index file normally gets created. In one case, for whatever reason, it has never gotten created and that may have something to do with --work-tree pointing to a directory containing nothing but symbolic links. Yet in another case with a similar setup there is an index file. I cannot recall the complete history of the creation of these repositories to explain these discrepancies. But at least I have the answer to my original question as to why files appear to be staged for deletion and I know what to do if I decide for the one repository to rename the www directory to public_html so I can get away from using my complicated hook.

Upvotes: 0

torek
torek

Reputation: 488193

The root of the problem is that you are trying to use Git as a deployment system as well as a version control system. As a deployment system, Git leaves a lot to be desired. (Well, some might say that about it as a VCS too, but... :-) )

  • In my opinion, using Git as a deployment tool is a mistake. The big danger here is not that it doesn't work, but rather that it almost works.

  • For another opinion (the other guy thinks it's OK in the end, with many caveats), see https://gitolite.com/deploy.html.

  • Note that Heroku and Ruby on Rails provide their own deployment wrappers (which I have never used), and various CI/CD systems have deployment tools and/or wrappers. Python comes with pip, which is better than nothing, but still has some issues: see, e.g., https://www.nylas.com/blog/packaging-deploying-python/. See https://stackify.com/top-deployment-tools-2018/ for a slightly out of date (2018) list of various deployment tools.

Lots of detail

Serge's comment/suggestion of git reset --mixed should actually "fix" the issue, but as you note it might well recur.

All I can say for sure, not having all your code for viewing and testing, is this: when you run:

git --git-dir=<path1> --work-tree=<path2> status

or:

GIT_DIR=<path1> GIT_WORK_TREE=<path2> status

the git status command is going to compare these three items against each other:

  1. the HEAD commit in the specified Git dir;
  2. the index—that's the, single, one, primary index—in the specified Git dir;
  3. the specified work-tree

You're seeing "staged for delete". Any time you see "changes staged for commit", this means that in comparing item #1 with item #2, Git sees that the HEAD commit does have these files, and the index does not have these files. Exactly what removed those files from the index—and maybe put any other files, ones that you don't want to be in the index, into the index—well, who can say?

If you see changes "not staged for commit", that means that when comparing item #2—the index for the Git directory—to item #3, your chosen work-tree, that means that what's in the index does not match what is in the work-tree.

It's quite likely that by abusing Git as a deployment tool, you're causing the (single) index to be overwritten at various points by the (multiple, different) deployments to (multiple, different) work-trees.1

Again, the real root of the problem is that Git is not a deployment tool in the first place. But we can see from the description so far that this will happen any time you attempt to use the (single) index to describe multiple different work-trees.

Note that the version of Git on your laptop is entirely irrelevant. The issue is that on the server, you're running Git commands, using the bare repository as the repository. As you do this, you temporarily (for one Git command) give that bare repository a work-tree. Its (single) index is now used, for that one Git command, to index the work-tree. Then, at some point, you run another Git command, with the same repository, but a different work-tree. That (single) index is now being used for the other work-tree.

The index in a bare repository is somewhat vestigial. It does exist, so you can use Git as a deployment tool like this, but if you are going to do that, you must use one index per temporarily-instantiated work-tree. To do that, set GIT_INDEX_FILE in the environment, per work-tree. Or, since the primary function of the index is to be used for making new commits, and Git will rebuild it automatically if it's removed, you can remove it—knowing that it lives in $GIT_DIR/index—and reset, but if you do this, you're setting things up for terrible failure modes if and when two deployments try to run simultaneously.


1When using git worktree add, which requires Git 2.15 or later to be really stable—it has a bad bug in earlier versions—Git creates a new index for each added work-tree. In theory, adding work-trees to a bare repository might be one way to make Git work rather better as a deployment tool. But Git still isn't actually a deployment tool, so this is analogous to discovering that some screwdrivers work better as chisels than do other screwdrivers. Sure, that particular flat-bladed screwdriver doesn't explode in your hand nearly as often as those other ones you were using a moment ago, but it's still not designed to be used that way.

Upvotes: 1

Related Questions