Leo Jiang
Leo Jiang

Reputation: 26085

Git: there's a file that shows it changed whether I remove it or not, how can I leave it unchanged?

I think someone changed the capitalization of a file, making it impossible for me to leave it as unchanged in a PR. After I did git pull and git rebase -i origin/master, git status shows I added a file that I never touched. I tried running git rm <filename>, but then it shows that I deleted the file.

I ran git checkout origin/master -- <filename>, which added back the file, but git status shows I added the file again.

I noticed that when I created a commit after removing the file, it shows two different capitalizations. I.e.:

[branchname 45dd45ce2] wip
 2 files changed, 140 deletions(-)
 delete mode 100644 Filename
 delete mode 100644 filename

I tried renaming the file to both capitalizations, but Git always saw it as a new file.

How can I leave it unchanged?

I'm using WSL with Ubuntu 18.04.

Upvotes: 0

Views: 148

Answers (2)

torek
torek

Reputation: 488183

TL;DR: your system is less capable than actual Linux, and this has bitten you

The Linux system that made the commit stored two files whose name differs only in case. Your system can't do that, so you cannot work within this framework on your own system, at least not directly. If you spin up a Linux VM inside your Windows system, you can work with it there. But there is a general method for dealing with this within your own system, which I will show as the last section of the long part. It has some flaws, but it can let you make progress.

(Really, though, the best solution is to spin up a Linux instance, and fix it directly.)

Long

I noticed that when I created a commit after removing the file, it shows two different capitalizations. I.e.:

[branchname 45dd45ce2] wip
 2 files changed, 140 deletions(-)
 delete mode 100644 Filename
 delete mode 100644 filename

What this means is that the commit you had checked out (the parent of the new commit 45dd45ce2) has, inside it, both spellings of this file name. Linux can do this, but Windows can't.1


1Technically, this is file-system specific. The problem occurs on file systems that fold case, and Windows and MacOS do that by default, while Linux does not do that by default. Apparently WSL uses the underlying Windows file system by default, thereby importing its features and limitations.


Let's take a step back, first, and look at what Git really does with commits. Remember that a commit contains a complete snapshot of some set of files. For each file, the commit store's the file's name, its execute permission flag, and its content. The commit itself is identified by a unique, big ugly hash ID, like 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2 for instance (this is a commit in the Git repository for Git itself). That commit also stores a log message, the commit author's name and email address, and a parent hash ID, among other data; but at the moment, we're going to concentrate on the files stored via the commit, with particular attention to their names. First, though, let's look briefly at the contents, because that part is of interest as well.

Every file is stored inside every commit. If Git stored a new copy every time, this would quickly make your repository very large. So Git doesn't store a new copy, if that's easy. In particular, if a new commit uses the same contents for most of its files, Git simply re-uses the old contents in the new commit. That means Git had better not touch the existing saved files' contents, so it doesn't: they're read-only. Meanwhile, to make the repository smaller, Git compresses these contents as well. So each file's content, stored in a Git repository, is in a special, read-only (hence share-able), compressed—sometimes very compressed—Git-only form. (Git calls these blobs. "Blob" is one of four internal Git object types, with the other three being tree, annotated tag, and commit. The names, meanwhile, are stored in those "tree" objects. You don't need to know these details, but they're sometimes useful.)

Once it is made, each commit is also read-only. In fact, this is true of all of Git's internal objects. Each object's hash ID is simply a cryptographic checksum of the object's data. This lets Git be sure that the data are intact, when it looks at the object again later: the current checksum of the data must match the hash ID used to find the object. If they do match, the data are correct; if not, something has somehow corrupted the commit. This is why you can't change a commit: if you change any of the data, the checksum changes, and you have instead a new and different commit. But the point we're concerned with here is that the commit, once made, is frozen in time: nothing can change inside it, and that includes the names of the files.

Nonetheless, the entire Git repository, in its special Gitty form, can be transferred to another system. Once it is, those commits can be extracted ... well, sort of. This is where the problems begin.

When Git checks a commit out from the repository, it must copy the frozen, read-only blobs out of the freezer, thaw them, and put them into normal everyday format so that you can actually use the files. Git does this in two steps: first it copies the frozen object into Git's index, where it's unfrozen but still in the special compressed Git-only format, using Git's own internal method for remembering the file's name and execute-permission bit, and then it uncompresses the frozen blob into your work-tree where you can work on it.

This last step is where things go wrong. Git needs to create one file named Filename, and another, different file named filename. On Linux, that's easy: just call the file-creator with the two names. On Windows, if you do that, the second file overwrites the first one, keeping whichever name you used first.

This means that no matter what you do, you end up with only one file in your work-tree, even though you have both files in your commit (in the special Git-only frozen format) and in your index (in the special Git-only format, unfrozen). This situation is difficult and painful. However, Git makes new commits from the index, so all is not yet lost.

A workaround

You can, on your Windows or MacOS system—Mac file systems have this same issue, as we saw in footnote 1—make a new commit in which one of the two names in the index has been renamed. I started by creating a repository with three files:

$ mkdir case
$ cd case
$ git init
Initialized empty Git repository in ...
$ echo test case issues > README
$ echo THIS FILE USES UPPERCASE > FILENAME
$ echo this file uses lowercase > filename
$ ls
filename        FILENAME        README
$ git add *
$ git commit -m initial
[master (root-commit) 46e94a6] initial
 3 files changed, 3 insertions(+)
 create mode 100644 FILENAME
 create mode 100644 README
 create mode 100644 filename

I then cloned this repository to a Mac:

$ git clone ssh:[url]
Cloning into 'case'...
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 5 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (5/5), done.
$ cd case
$ git status --short
 M FILENAME
$ ls
README          filename
$ git ls-files
FILENAME
README
filename

The trick now is to rename one of the two files in the index. I don't like all-caps in general, so let's rename the uppercase one now:

$ git mv FILENAME UC-FILENAME

(perhaps I should have mv-ed it to the name yucky-filename :-) ). One can use git ls-files to check that this worked (or git ls-files --stage to get the verbose version), and I did, but I will just show the commit next:2

$ git commit -m 'fix case-collision'
[master 7712644] fix case-collision
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename FILENAME => UC-FILENAME (100%)

Now we have to fix the work-tree, which is out of sync with the index and repository. The easy way to do that is using git reset --hard:

$ git reset --hard
HEAD is now at 7712644 fix case-collision
$ ls
README          UC-FILENAME     filename
$ cat UC-FILENAME
THIS FILE USES UPPERCASE
$ cat filename
this file uses lowercase

We could now push this back, if the receiving repository were a --bare one (it's not), but the point is that we can now work with the files natively (in this case, on this particular Mac) as they no longer conflict with the native file system.


2These are terrible commit messages. Use something better when dealing with a real repository, rather than a test case.

Upvotes: 1

HairOfTheDog
HairOfTheDog

Reputation: 2737

Resetting your local repository to mirror that of the remote should resolve the problem for you.

git reset --hard

Of course you should be aware that resetting your local repository will cause you to lose any work not yet pushed to the remote. So make a backup of the directory before resetting. (I like to zip the directory in these situations.)

Upvotes: 0

Related Questions