Siggi
Siggi

Reputation: 415

Losing history when moving files with git

After reading an answer to a question on here on moving files tracked by git, the history should not be impacted if you move a file that git is tracking. However that is not my experience, so what am I doing wrong? Here is my console log

C:\scripts\Python\Cyren>mkdir archive
    
C:\scripts\Python\Cyren>dir
 Volume in drive C has no label.
 Volume Serial Number is 7C59-18A2

 Directory of C:\scripts\Python\Cyren

08-May-22  22:44    <DIR>          .
08-May-22  22:44    <DIR>          ..
08-May-22  22:43    <DIR>          archive
31-Mar-22  19:11             1,878 categories.csv
31-Mar-22  17:57             1,886 categories.txt
30-Mar-22  21:19            14,557 categories.xlsx
29-Apr-22  16:23            19,274 CyrenDopplerAPI.py
06-May-22  12:35            14,585 CyrenDopplerEnv.py
29-Apr-22  16:16            17,672 CyrenSample.py

C:\scripts\Python\Cyren>git log CyrenDopplerAPI.py
commit 4f440a2c132053ebe9c76a16e90abc1dd845d262
Author: Siggi@Reba <[email protected]>
Date:   Thu May 5 15:38:58 2022 +0000

    rename

C:\scripts\Python\Cyren>git mv CyrenDopplerAPI.py archive

C:\scripts\Python\Cyren>dir
 Volume in drive C has no label.
 Volume Serial Number is 7C59-18A2

 Directory of C:\scripts\Python\Cyren

08-May-22  22:48    <DIR>          .
08-May-22  22:48    <DIR>          ..
08-May-22  22:48    <DIR>          archive
31-Mar-22  19:11             1,878 categories.csv
31-Mar-22  17:57             1,886 categories.txt
30-Mar-22  21:19            14,557 categories.xlsx
06-May-22  12:35            14,585 CyrenDopplerEnv.py
05-May-22  18:24               109 Infile.txt
06-May-22  12:35    <DIR>          Logs
               5 File(s)         33,015 bytes
               4 Dir(s)  484,630,781,952 bytes free

C:\scripts\Python\Cyren>git status
On branch master
Your branch is up to date with 'origin/master'.

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        renamed:    CyrenDopplerAPI.py -> archive/CyrenDopplerAPI.py
        renamed:    CyrenSample.py -> archive/CyrenSample.py
C:\scripts\Python\Cyren>git commit -m "cleanup and archiving"
[master 1240039] cleanup and archiving
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename Cyren/{ => archive}/CyrenDopplerAPI.py (100%)
 rename Cyren/{ => archive}/CyrenSample.py (100%)

C:\scripts\Python\Cyren>git status
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

C:\scripts\Python\Cyren>git push
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 8 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 454 bytes | 454.00 KiB/s, done.
Total 4 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/siggib007/python.git
   e85489f..1240039  master -> master

C:\scripts\Python\Cyren>git log archive\CyrenDopplerAPI.py
commit 1240039a7e93993027f22979908ee4a9837a5474 (HEAD -> master, origin/master)
Author: Siggi@Reba <[email protected]>
Date:   Sun May 8 22:49:03 2022 +0000

    cleanup and archiving

The git log isn't showing the log entries that were there before the more, after the move. I take that to mean that it was erased.

BTW Here is the link the corrensponding github repo https://github.com/siggib007/python/tree/master/Cyren

Also this is not an isolated incident, this is consolelog from me reproducing an issue I've noticed couple of times before.

Upvotes: 4

Views: 3597

Answers (2)

torek
torek

Reputation: 488183

A few brief (for me) notes:

The basic problem here is that Git does not have "file history". Git has commits and the commits are the history; each commit has a full snapshot of every file; and as a result, renaming a file and committing produces a new commit in which, by comparison to the snapshot for the old commit, file old/path/to/file.ext is deleted and new file new/path/to/file.ext appears out of nowhere. The contents of these two differently-named files match, so Git de-duplicates the contents in the commit snapshots, and there's literally only one copy of the file in the repository. But as far as the pair of snapshots go, the difference between the old one and the new one is "delete old file, create new file".

What Git does to handle this case is—optionally—to look for cases of "delete old, create new" in which the old and new files are exactly the same, or sufficiently similar. Having detected such a case, git diff will call the file renamed.

The git log command normally works commit-by-commit, and when using git log -p to show patches, runs git diff on each <old, new> pair. You can have this git diff do rename detection, and it will then say "rename <old path> to <new path>". There are some minor flaws in this scheme, but overall, it works fairly well.

The rub comes in when you want to run git log -- path/to/file.ext. What this does is instruct git log to look at all the commits in this particular history the same way it would for git log without the path-name, but then only print some of those commits. Remember, the history is the commits—all of them—so this looks at all the same history as git log without the pathname.1 But we don't want all that history; we only want the commits in which path/to/file.ext changes from the old to new snapshot. So git log prints only those commits in which this exact path's file changes.

But at some point earlier, path/to/file.ext was named path/file.ext without the middle to part. When git log traverses backwards through that commit, the file is no longer called path/to/file.ext any more. It's now called by its old name, path/file.ext. So git log omits printing any commit that doesn't change path/to/file.ext, but it should be printing any commit that does change path/file.ext.

That's what --follow does: it tells git log that, as it is working its way backwards through history, it should detect renames. Having detected a rename, it should stop looking for the new name and start looking for the old one instead.

This works great, but there's a huge problem: it only works for one file name. This has been the case for well over a decade. The --follow code for Git is a horrible hack internally and is long overdue for improvement (but it's very tricky to do it efficiently and correctly, which is the reason it has never been improved).

So: use --follow but be aware of its limitations. It only works on one file, and even then, Git has to detect the rename. Git usually does detect the rename and you will see this when you make the commit itself:

C:\scripts\Python\Cyren>git commit -m "cleanup and archiving"
[master 1240039] cleanup and archiving
 2 files changed, 0 insertions(+), 0 deletions(-)
 rename Cyren/{ => archive}/CyrenDopplerAPI.py (100%)
 rename Cyren/{ => archive}/CyrenSample.py (100%)

That "rename" output indicates that Git detected the rename this time, so it will continue to do so with git log --follow. The (100%) means it used the fastest path for doing that detection (another plus: the --follow case will go faster, though you'll probably never really notice these days).


1Actually, git log with pathspecs doesn't look at all the history by default. To be precise, this turns on what git log calls "History Simplification". History simplification is tricky; it sometimes does what you want, and sometimes does the exact opposite of what you want. The main things to know about it are:

  • it exists;
  • it turns on when you use git log with path names; and
  • you can turn it off with --full-history if it's causing you grief.

Upvotes: 4

John Kugelman
John Kugelman

Reputation: 361605

Use git log --follow to track renames.

--follow

Continue listing the history of a file beyond renames (works only for a single file).

You can make this the default by setting the log.follow configuration option to true:

log.follow

If true, git log will act as if the --follow option was used when a single is given. This has the same limitations as --follow, i.e. it cannot be used to follow multiple files and does not work well on non-linear history.

To set it run:

$ git config --global log.follow true

Upvotes: 5

Related Questions