Git says there are changes but there are none

Question

First, I was unable to discard changes to a file, I'd use git reset --hard, no errors, but the changes remained. I tried several suggestions from other StackOverflow articles.

git rm .gitattributes
git add -A
git reset --hard

git rm --cached [fileName]

I even blew away my repo, and immediately upon cloning the repo fresh it had the same modified file. If I manually make changes to the file(remove what GIT things is different using an editor) , I can see it in the modified list twice now (as in, when I run git status I see the same file with the same path listed twice under unstaged files). Finally finding this one that actually reset the file and removed it from the modified list.

git ls-files -m | xargs -i git update-index --assume-unchanged "{}"

Now, I tried to switch branches and it says changes will be overwritten, I must commit or stash my changes, yet, when I run git stash it tells me there are no changes. I've tried several suggestions, including trying to update line endings and nothing is letting me checkout a branch. The branch does have a "/" in it, could that be causing issues?

C:\git\azureWebApps\CRM-WebAPI (qa -> origin)λ git status
On branch qa
Your branch is up to date with 'origin/qa'.

nothing to commit, working tree clean

C:\git\azureWebApps\CRM-WebAPI (qa -> origin)λ git diff

C:\git\azureWebApps\CRM-WebAPI (qa -> origin)λ git checkout feature/PEN-146-CreateUpdatePerson
error: Your local changes to the following files would be overwritten by > checkout: CRM-RestAPI/web.config
Please commit your changes or stash them before you switch branches.
Aborting

C:\git\azureWebApps\RM-WebAPI (qa -> origin)λ git stash
No local changes to save

torek · Accepted Answer

[Note: this turned out to be a file name case issue after all; see "edit" below.]

... immediately upon cloning the repo fresh it had the same modified file

This means one of two things is going on:

the file is getting changed, or
Git's idea of which file is which does not correspond to reality on your machine.

If I manually make changes to the file, I can see it in the modified list twice now.

This makes less sense. It would help if you included a cut-and-paste of what you mean.

[Edit: per comments, this was something like—I would use the actual text but I have to recreate it since I don't have the issue myself—the following:

(boilerplate snipped)
    modified:   CRM-RestAPI/Web.config
    modified:   CRM-RestAPI/web.config

While Git treats these as two separate files (as they will be on a file system that is case-sensitive), if your OS does case-folding (and Windows or MacOS do this by default), there will be only one file, named with either uppercase or lowercase W, rather than both, with both letters in both cases. This is a specific example of a general issue I describe below, where Git stores file names as nearly-arbitrary byte strings, but not all OSes do so.]

Background that you will need to locate and solve the problem

Since it's not clear what the problem actually is yet, it's impossible to solve it yet. It might be line-ending issues, or it might be something else. You'll need the following information.

Every file exists in three versions

In most cases—which always includes any fresh clone—each file that you can see exists in three versions. All three should normally be the same, but they can be different (on purpose or not).

In any case, you have a current commit, whose hash ID you can find with:

git rev-parse HEAD

The hash ID of the current commit changes as you check out different commits or make new commits, but there's always some current commit (with an exception that doesn't occur here).

Each commit lists a bunch of files that should be checked out if you git checkout that particular commit. You can see those files, if you want, using:

git ls-tree -r

which shows you in gory detail every file that goes with that commit.

Every commit is read-only—the files stored in this commit, under this hash ID, are stored permanently¹ there, and they can never change.

The second copy of every file is kept in Git's index. This is the thing you are manipulating with git update-index --assume-unchanged. The index is a central data structure that Git uses for many things, but it's perhaps best described as where you (and Git) build the next commit you will make. As such, the index normally starts out exactly matching the current commit. Every file that is in the current commit is also in the index, in the same special, Git-only, compressed format that Git uses. (Technically the index simply shares the commit's copy of the file.) The important difference between the index copy and the commit copy is that the index copy can be overwritten, after which the index is no longer sharing the commit's version of the file. The index copy is still in the special Git-only compressed format, but unlike the committed copy, you can overwrite the index copy.

The last copy of every file is the one you actually work with. This file is in its normal everyday form on the computer, not in a special Git-only format. Because it is in the normal form, it's subject to whatever limitations your system imposes, and that's where we get into the interesting parts.

¹As permanent as the commit itself, anyway. If you get Git to forget about the commit, the files themselves will go away unless some other commit(S) is/are sharing them.

HEAD, index, and work-tree

We can illustrate the three copies by labeling them a bit:

  HEAD        index     work-tree
---------   ---------   ---------
README.md   README.md   README.md
somefile    somefile    somefile

and so on. Git copies the files between these various versions, except that the HEAD (committed) version is always read-only, so to "change" the committed version, Git builds a new commit from whatever is in the index right now.

The git status command tells you about these by comparing, first, the HEAD version of every file to the index version of every file. If something is different here, git status prints the file's name and tells you that this is a change that is ready to be committed. Then, it compares the index version of every file to the work-tree version of every file. If something is different here, git status prints the file's name and tells you that this is a change that is not yet staged for commit.

The git checkout command copies files from a commit to the index and work-tree, or from the index to the work-tree. (These should be separate commands—and were at one point.) The git reset command copies files from the commit to the index, but not to the work-tree. The git add command copies files from the work-tree to the index. The git commit command makes a new commit from whatever is in the index, and then arranges things so that HEAD now refers to the new commit.

The things of interest, or, What to look for

Now that you know what's in the various parts, here's where things can go wrong.

HEAD and the index need not use the computer native name format

The names of files stored in commits and in the index are just byte-strings in Git. Git is generally "encoding agnositc", as the phrase goes, except that it uses slashes to separate directory names from sub-directories and files, and an ASCII NUL byte to terminate these byte-strings. This allows Git to use UTF-8 to encode file names, since UTF-8 encoding never encodes any character other than the slash / (ASCII 0x2f) as byte code 0x2f. If you're on a system that uses backslash instead of slash, either it also allows forward slash internally, or Git translates the slashes as necessary, so that this all works.

This also means that Git's file names are case-sensitive: the file README is entirely different from the file readme, which is different from the two different files Readme and ReadMe. The same holds for directory names.

Meanwhile, your own computer may have a case-insensitive file system: there's only one file here, whose name is whichever is the first of those you chose. If you have a file named ReadMe and you open README, you get ReadMe, not a new file named README. (This is the case on Windows and MacOS by default.)

Similarly, if your computer normalizes names like schön, there are two different UTF-8 spellings for this name, and Git will treat them as two different file names, but your computer will both treat them as referring to one file. (This is the case on MacOS; I am not sure about Windows.)

If this is the problem, it's rather pervasive and difficult to deal with. Your best bet is to bring up a Unix or Linux system, which doesn't do case-folding and normalization, and work with the repository to eliminate the problematic file names. You can then check out any of the commits that has been fixed, since those commits no longer provide names that trip up your OS.

Line-endings and other filters

Aside from file names, you have also seen that Git can fiddle with line endings. Repositories made on a Linux or Unix-like system will generally use newline-only (LF-only) line endings, while files to be edited on a Windows system may require carriage-return-newline sequences (CR-LF or CRLF endings). To enable cross-system work, Git offers the ability, but not the requirement, to do some sneaky line-ending changes.

The way this works in general is that Git refers to some files as clean and some as smudged. What's stored in commits and in the index—i.e., in the compressed Git-only format—is always assumed to be clean. Whenever Git copies a file from the index to the work-tree, it smudges the file, and whenever it copies that same file from the work-tree back into the index, it cleans the file.

If you enable CRLF line endings, the smudging process includes changing LF-only to CRLF, and the cleaning process includes changing CRLF to LF-only.² This means that as long as all files in the repository are truly clean, they become properly smudged on your Windows system, and are re-cleaned when you git add them before you git commit the cleaned files.

But—this is the key point here—all of this process is optional. The Linux users don't pay for any of it because they turn all of it off, and the Linux-side Git repository then stores whatever is in the work-tree into the index version, even if the work-tree file has CRLF line endings. These can then be committed, so that the commits' contents include the CRLF endings.

If you extract such a file onto a Windows machine and have CRLF cleaning turned on, the index->work-tree conversion leaves the CRLFs alone. But now all the work-tree files will have their CRLFs changed to LF-only, so they no longer match! They're all instantly changed (but not yet ready to be committed).

This situation is also extra-tricky because Git attempts, through various means, to know when work-tree files are smudged or clean without running them through all the smudging and cleaning processes. (It's kind of slow—very slow, in some cases—so this is usually important.) But this means that the "changed-ness" of files is somewhat unpredictable and hard to diagnose. The trick is to inspect the raw file content, which is easiest if you once again clone the repository on a Linux system, where there's no manipulation at all of line endings. You can then use a line-ending-aware inspector to see what's in the file.

(You can do this on other systems using git cat-file -p to extract particular files without any smudging or other filters or text conversions, and examine the resulting byte-stream with a line-ending-aware inspector. How to do the latter on Windows, I have no idea—I avoid Windows systems, as a rule. MacOS has cat -v and hexdump.)

²We say "includes" rather than "consists of" here because you can write your own smudge and clean filters, which are applied in addition to the CRLF tweaking.