Reputation: 38737
I'd like to extract some files to a new repo, keeping their history, including files renaming.
Best and closest answer I could find was new-repo-with-copied-history-of-only-currently-tracked-files, using git filter-branch --index-filter
. It successfully keeps history of existing files, but it doesn't preserve history of renamed files.
(Another answer I could find was using git filter-branch --subdirectory-filter
. But it has two issues: doesn't seem to work for the whole repo (folder '.') and doesn't preserve history of renamed files.)
(Yet another answer was using git subtree
. But it doesn't keep history at all.)
So I'm probably looking for a way to improve the git ls-files > keep-these.txt
command from closest answer to also list all previous file names. Maybe a script?
Upvotes: 2
Views: 524
Reputation: 489998
Git doesn't store file name changes.
Each commit stores a complete tree, e.g., perhaps commit 1234567...
has files README
and foo.txt
and commit fedcba9... has files readme.txt
and foo
. If you ask git to compare commit 1234567
to commit fedcba9
, and README
is sufficiently similar1 to readme.txt
, git will say that the way to transform the one commit to the other is to rename the file. (If the one commit is the parent of the other, git show
of the child commit will show the rename, because git show
computes this change at git show
time.)
On the other hand, if the second readme
file is too different, but README
is sufficiently similar to foo
, git will say that the way to change 1234567
to achieve fedcba9
is to rename README
to foo
.
The key is that git computes that when you ask for the comparison, and not a moment earlier. There's nothing in between the commits that says "rename some files". Git simply compares the commits and decides then whether the files are similar enough.
For your purposes, what this ultimately means is that for each commit in your sequence-of-commits-to-copy-or-partially-copy, you'll have to decide which path names to keep and which to discard. How to achieve that is mostly up to you. The git log
command does have a --follow
flag to activate a limited amount of rename detection as it works backwards from child commits to their parents, and git blame
automatically tries to do the same; you could use these (one path name at a time) to come up with a mapping of the form:
in: commits A..B C..D E..F
use path: dir/file.ext dir/frill.txt lib/frill.next
for instance. But there's nothing built in to do this, and it won't be particularly easy. I'd start by combining git log --follow
with --raw
or --name-status
output and seeing if there are any interesting Renames detected. If and when there are, those are the commit boundaries at which you'll want to change which paths you're keeping and discarding as you work through commits (whether that's with filter-branch
or some other method).
If that doesn't work, or you need more control, consider running git diff --name-status
between various commit pairs (with commit pair info coming from git rev-list
).
1As long as you've asked for rename detection, "exactly the same" is sufficiently similar, as is anything down to about "50% similar". You can tweak the required similarity with the optional value you supply to git diff
's -M
flag.
Edit: this seems to work OK. I used it on git's own builtin/var.c
, which used to have two previous names according to this:
$ git log --follow --raw --diff-filter=R --pretty=format:%H builtin/var.c
81b50f3ce40bfdd66e5d967bf82be001039a9a98
:100644 100644 2280518... 2280518... R100 builtin-var.c builtin/var.c
55b6745d633b9501576eb02183da0b0fb1cee964
:100644 100644 d9892f8... 2280518... R096 var.c builtin-var.c
The --diff-filter
suppresses everything but rename outputs so that we get to see which commit seems to rename the file. Turning this into something more useful requires a bit more work, but this might get you fairly far:
git log --follow --raw --diff-filter=R --pretty=format:%H builtin/var.c |
while true; do
if ! read hash; then break; fi
IFS=$'\t' read mode_etc oldname newname
read blankline
echo in $hash, rename $oldname to $newname
done
which produced:
in 81b50f3ce40bfdd66e5d967bf82be001039a9a98, rename builtin-var.c to builtin/var.c
in 55b6745d633b9501576eb02183da0b0fb1cee964, rename var.c to builtin-var.c
Upvotes: 3