Danilo Pianini
Danilo Pianini

Reputation: 1106

git apparently keeps saying that a file has been modified when it has not

I am experiencing a curious situation, probably related to this question, but I'd like to better understand what is going on here.

I have a repository where right after a clone git status reports that a file has been modified.

I created a minimal reproduction here, with a repo containing just the ignore list, a very trivial .gitattributes, and the file causing me headaches: gradlew.bat.

All my attempts in the following are performed using Linux/ZSH (the issue has been reproduced on multiple Linux installations and shells).

Right after clone, if I run git status, I get:

❯ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   gradlew.bat

no changes added to commit (use "git add" and/or "git commit -a")

And if I try to check out the unmodified version with git checkout HEAD -- gradlew.bat, then issue git status again:

❯ git checkout HEAD -- gradlew.bat
❯ git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   gradlew.bat

no changes added to commit (use "git add" and/or "git commit -a")

Okay then, I downloaded the file directly from GitHub, and checked the hashes:

❯ md5sum gradlew.bat
6b56324406b764fd6c5d4d7d215a3cd7  gradlew.bat
❯ sha512sum gradlew.bat
d4fef021e30640670fe20243e4fc4f0336b2f118f8c172c138a8c0c3028c93b12da9479812cede4196401bbc87ce9df89573dbec7378373cafafca6698867f55  gradlew.bat

Which are exactly the same of the file git mark as changed:

❯ md5sum gradlew.bat && sha512sum gradlew.bat
6b56324406b764fd6c5d4d7d215a3cd7  gradlew.bat
d4fef021e30640670fe20243e4fc4f0336b2f118f8c172c138a8c0c3028c93b12da9479812cede4196401bbc87ce9df89573dbec7378373cafafca6698867f55  gradlew.bat

This means it's not even matter of LF/CRLF line endings.

git diff is not helpful either, as it just suggests that the file changed entirely:

diff --git a/gradlew.bat b/gradlew.bat
index ac1b06f..107acd3 100755
--- a/gradlew.bat
+++ b/gradlew.bat
@@ -1,89 +1,89 @@
-@rem
-@rem Copyright 2015 the original author or authors.
-@rem
-@rem Licensed under the Apache License, Version 2.0 (the "License");
-@rem you may not use this file except in compliance with the License.
-@rem You may obtain a copy of the License at
-@rem
-@rem      https://www.apache.org/licenses/LICENSE-2.0
-@rem
-@rem Unless required by applicable law or agreed to in writing, software
-@rem distributed under the License is distributed on an "AS IS" BASIS,
-@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-@rem See the License for the specific language governing permissions and
-@rem limitations under the License.
-@rem
-
-@if "%DEBUG%" == "" @echo off
-@rem ##########################################################################
-@rem
-@rem  Gradle startup script for Windows
-@rem
-@rem ##########################################################################
-
-@rem Set local scope for the variables with windows NT shell
-if "%OS%"=="Windows_NT" setlocal
-
-set DIRNAME=%~dp0
-if "%DIRNAME%" == "" set DIRNAME=.
-set APP_BASE_NAME=%~n0
-set APP_HOME=%DIRNAME%
-
-@rem Resolve any "." and ".." in APP_HOME to make it shorter.
-for %%i in ("%APP_HOME%") do set APP_HOME=%%~fi
-
-@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
-set DEFAULT_JVM_OPTS="-Xmx64m" "-Xms64m"
-
-@rem Find java.exe
-if defined JAVA_HOME goto findJavaFromJavaHome
-
-set JAVA_EXE=java.exe
-%JAVA_EXE% -version >NUL 2>&1
-if "%ERRORLEVEL%" == "0" goto execute
-
-echo.
-echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
-echo.
-echo Please set the JAVA_HOME variable in your environment to match the
-echo location of your Java installation.
-
-goto fail
-
-:findJavaFromJavaHome
-set JAVA_HOME=%JAVA_HOME:"=%
-set JAVA_EXE=%JAVA_HOME%/bin/java.exe
-
-if exist "%JAVA_EXE%" goto execute
-
-echo.
-echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
-echo.
-echo Please set the JAVA_HOME variable in your environment to match the
-echo location of your Java installation.
-
-goto fail
-
-:execute
-@rem Setup the command line
-
-set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
-
-
-@rem Execute Gradle
-"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %*
-
-:end
-@rem End local scope for the variables with windows NT shell
-if "%ERRORLEVEL%"=="0" goto mainEnd
-
-:fail
-rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
-rem the _cmd.exe /c_ return code!
-if  not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
-exit /b 1
-
-:mainEnd
-if "%OS%"=="Windows_NT" endlocal
-
-:omega
+@rem
+@rem Copyright 2015 the original author or authors.
+@rem
+@rem Licensed under the Apache License, Version 2.0 (the "License");
+@rem you may not use this file except in compliance with the License.
+@rem You may obtain a copy of the License at
+@rem
+@rem      https://www.apache.org/licenses/LICENSE-2.0
+@rem
+@rem Unless required by applicable law or agreed to in writing, software
+@rem distributed under the License is distributed on an "AS IS" BASIS,
+@rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+@rem See the License for the specific language governing permissions and
+@rem limitations under the License.
+@rem
+
+@if "%DEBUG%" == "" @echo off
+@rem ##########################################################################
+@rem
+@rem  Gradle startup script for Windows
+@rem
+@rem ##########################################################################
+
+@rem Set local scope for the variables with windows NT shell
+if "%OS%"=="Windows_NT" setlocal
+
+set DIRNAME=%~dp0
+if "%DIRNAME%" == "" set DIRNAME=.
+set APP_BASE_NAME=%~n0
+set APP_HOME=%DIRNAME%
+
+@rem Resolve any "." and ".." in APP_HOME to make it shorter.
+for %%i in ("%APP_HOME%") do set APP_HOME=%%~fi
+
+@rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
+set DEFAULT_JVM_OPTS="-Xmx64m" "-Xms64m"
+
+@rem Find java.exe
+if defined JAVA_HOME goto findJavaFromJavaHome
+
+set JAVA_EXE=java.exe
+%JAVA_EXE% -version >NUL 2>&1
+if "%ERRORLEVEL%" == "0" goto execute
+
+echo.
+echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
+echo.
+echo Please set the JAVA_HOME variable in your environment to match the
+echo location of your Java installation.
+
+goto fail
+
+:findJavaFromJavaHome
+set JAVA_HOME=%JAVA_HOME:"=%
+set JAVA_EXE=%JAVA_HOME%/bin/java.exe
+
+if exist "%JAVA_EXE%" goto execute
+
+echo.
+echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
+echo.
+echo Please set the JAVA_HOME variable in your environment to match the
+echo location of your Java installation.
+
+goto fail
+
+:execute
+@rem Setup the command line
+
+set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
+
+
+@rem Execute Gradle
+"%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %*
+
+:end
+@rem End local scope for the variables with windows NT shell
+if "%ERRORLEVEL%"=="0" goto mainEnd
+
+:fail
+rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
+rem the _cmd.exe /c_ return code!
+if  not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
+exit /b 1
+
+:mainEnd
+if "%OS%"=="Windows_NT" endlocal
+
+:omega

The next I could think of was permissions, but the file was -rwxr-xr-x and remains -rwxr-xr-x.

I tried to see if there's anything else via stat, but I found no clue there either:

❯ git reset --hard HEAD && stat gradlew.bat && git status && stat gradlew.bat
HEAD is now at f6d1022 remove irrelevant stuff
  File: gradlew.bat
  Size: 2763            Blocks: 8          IO Block: 4096   regular file
Device: 259,2   Inode: 7342244     Links: 1
Access: (0755/-rwxr-xr-x)  Uid: ( 1000/  <redacted>)   Gid: ( 1000/  <redacted>)
Access: 2022-07-05 14:48:48.314141714 +0200
Modify: 2022-07-05 14:48:48.314141714 +0200
Change: 2022-07-05 14:48:48.314141714 +0200
 Birth: 2022-07-05 14:48:48.314141714 +0200
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   gradlew.bat

no changes added to commit (use "git add" and/or "git commit -a")
  File: gradlew.bat
  Size: 2763            Blocks: 8          IO Block: 4096   regular file
Device: 259,2   Inode: 7342244     Links: 1
Access: (0755/-rwxr-xr-x)  Uid: ( 1000/  <redacted>)   Gid: ( 1000/  <redacted>)
Access: 2022-07-05 14:48:48.314141714 +0200
Modify: 2022-07-05 14:48:48.314141714 +0200
Change: 2022-07-05 14:48:48.314141714 +0200
 Birth: 2022-07-05 14:48:48.314141714 +0200

I'm now out of ideas, what is causing this behaviour?

Upvotes: 2

Views: 1889

Answers (1)

torek
torek

Reputation: 487745

This means it's not even matter of LF/CRLF line endings.

Ah, but it is.

Your repository is clone-able, so I cloned it. Here's what's actually in the file:

$ git rev-parse HEAD:gradlew.bat
ac1b06f93825db68fb0c0b5150917f340eaa5d02
$ git cat-file -p ac1b06f93825db68fb0c0b5150917f340eaa5d02 | head -3 | vis
@rem\^M
@rem Copyright 2015 the original author or authors.\^M
@rem\^M

The vis command shows what's in the file, making sure that control characters like carriage return (control-M) are visible as backslash, hat, letter-code. We see that the file actually has CRLF endings as stored in the repository. This copy of the file literally cannot be changed, because it's inside a commit, and no part of any commit can ever be changed.

Curiously, we find the following .gitattributes file:

$ vis .gitattributes
* text=auto eol=lf
*.[cC][mM][dD] text eol=crlf
*.[bB][aA][tT] text eol=crlf
*.[pP][sS]1 text eol=crlf

Now, the interesting thing about a .gitattributes like this is that it tells Git to mess with file data. The tricky part is how Git will go about doing this messing-with-file-data:

  • Git will, on copying the file out of a Git commit or other internal frozen-format version (e.g., extracting from commit or index to working tree), do LF-only to CRLF editing if / as directed;
  • Git will, on copying the file from the working tree to compress it and store it in the repository (in the index or eventually in a commit), do CRLF to LF-only editing if/as directed.

The "as directed" part is complex and is determined by the rules in the .gitattributes, but yours add up to saying that for *.bat files, Git should do the operation in both cases. So it does:

  • On the way out, any LF-only line endings that are present become CRLF endings.
  • On the way in, any CRLF line endings become LF-only endings.

Since the file as committed has CRLF endings, nothing happens "on the way out", but should you put the file back in, it will be changed to be stored with LF-only line endings.

We can see this in action here. We start with git ls-files --eol to tell us what's actually in the index and working tree, for each file stored in Git's index:

$ git ls-files --eol
i/lf    w/lf    attr/text=auto eol=lf   .gitattributes
i/lf    w/lf    attr/text=auto eol=lf   .gitignore
i/crlf  w/crlf  attr/text eol=crlf      gradlew.bat

So we see that the attrs applied to gradlew.bat are text eol=crlf. The attrs applied to the other files are text=auto eol=lf.

The index and working tree copies of the .gitattributes and .gitignore are LF-only. The index and working tree copies of gradlew.bat are CRLF (for both).

If we now git add gradlew.bat—we may need to use --renormalize, depending on Git vintage and certain raw stat data and timings and a lot of other details that vary from one system to another—and then run git ls-files --eol again, we see that the index version of gradlew.bat has changed:

$ git ls-files --eol
i/lf    w/lf    attr/text=auto eol=lf   .gitattributes
i/lf    w/lf    attr/text=auto eol=lf   .gitignore
i/lf    w/crlf  attr/text eol=crlf      gradlew.bat

Committing this version will make a new commit in which the stored-for-all-time copy has LF-only line endings. Every extraction will produce CRLF endings, because gradlew.bat has attr/text eol=crlf applied, and every git add will have those CRLF endings changed back to LF-only.

This whole area of Git's operation is very messy. If it's possible to not have Git mess with line endings, that's always my preference. However, if some files must have CRLF endings, the .gitattributes style you've written is my preference here: the files in the repository will be LF-only, but the files in your working tree will be CRLF files. You may have to do one git add --renormalize . pass to "clean up" and commit so that from then on, Git is happy with things.

Upvotes: 6

Related Questions