Andreas Pardeike
Andreas Pardeike

Reputation: 5042

Mirroring git repositories one-way while applying deterministic changes

I have the following external git repo:

test % git log
commit 022c4bebe111329c9b07e714c353d68d238d6187
Author: Andreas Pardeike <[email protected]>
Date:   Wed Mar 29 10:13:36 2023 +0200

    first commit

test % cat README.md
# test

I clone it into a temp repo test-1 and run

test-1 % git filter-branch --tree-filter '
  export GIT_AUTHOR_DATE="$(git log -1 --format=%aI)"
  export GIT_COMMITTER_DATE="$(git log -1 --format=%cI)"
  echo "// modified" >> README.md
' -- --all

test-1 % git log
commit c79e6cc1180be936358a41d8beaba99a7ce33c71 (HEAD -> main, origin/main, origin/HEAD)
Author: Andreas Pardeike <[email protected]>
Date:   Wed Mar 29 10:13:36 2023 +0200

    first commit

Now I change README.md in the original repo:

test % echo "# test-new" > README.md
test % cat README.md
# test-new
test % git add .
test % git commit -m "change"
[main f04f865] change
 1 file changed, 1 insertion(+), 1 deletion(-)
test % git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Writing objects: 100% (3/3), 255 bytes | 255.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
To https://github.com/pardeike/test.git
   022c4be..f04f865  main -> main
test % git log
commit f04f865c6e546318df7256e4cd48b3bb9505a710 (HEAD -> main, origin/main)
Author: Andreas Pardeike <[email protected]>
Date:   Wed Mar 29 10:20:06 2023 +0200

    change

commit 022c4bebe111329c9b07e714c353d68d238d6187
Author: Andreas Pardeike <[email protected]>
Date:   Wed Mar 29 10:13:36 2023 +0200

    first commit

and I clone it afterwards into a new temp repo test-2 and run the same modification:

test-2 % git log
commit f04f865c6e546318df7256e4cd48b3bb9505a710 (HEAD -> main, origin/main, origin/HEAD)
Author: Andreas Pardeike <[email protected]>
Date:   Wed Mar 29 10:20:06 2023 +0200

    change

commit 022c4bebe111329c9b07e714c353d68d238d6187
Author: Andreas Pardeike <[email protected]>
Date:   Wed Mar 29 10:13:36 2023 +0200

    first commit

test-2 % git filter-branch --tree-filter '
  export GIT_AUTHOR_DATE="$(git log -1 --format=%aI)"
  export GIT_COMMITTER_DATE="$(git log -1 --format=%cI)"
  echo "// modified" >> README.md
' -- --all

test-2 % git log
commit 00513669b9a95db42317c6734152e7b2861b89b7 (HEAD -> main, origin/main, origin/HEAD)
Author: Andreas Pardeike <[email protected]>
Date:   Wed Mar 29 10:20:06 2023 +0200

    change

commit 6bc296719d3aa3844c1ebb352f3daf67184f0025
Author: Andreas Pardeike <[email protected]>
Date:   Wed Mar 29 10:20:06 2023 +0200

    first commit

Question: Why is the commit hash of 'first commit' different between temp repo test-1 and test-2? I understand that they are different than the original but how do I get them to stay the same in any temp repository?

Why: My task is to rewrite external repositories to mirror them for in-house use where we don't have access to internet. I need to rewrite one specific file during the mirroring to point any dependency urls to our internal git repository. I do this by appending a static text snippet to the end of an existing, well known file in all repositories processed for all tags/commits. There is no risk for conflicts because those internal repositories are read-only.

Alternatively: How can I just "process newly commits" when mirroring from original to internal repositories?

Upvotes: 0

Views: 46

Answers (2)

&#216;sse
&#216;sse

Reputation: 171

In the second repo both commits' author dates (and presumable their committer dates) are equal to that of the second commit in the original repo. This suggests that the values you export are wrong. If you want to reference the currently processed commit you can do git log -1 ... $GIT_COMMIT. However, --tree-filter does not update these dates by default so in your case you don't have to do anything to tweak them.

Upvotes: 2

Andreas Pardeike
Andreas Pardeike

Reputation: 5042

Nevermind, git filter-branch --tree-filter creates a stable new history with its own commit hashes but those hashes stay the same for repeated mirroring of new commits in the original repository.

So just use git filter-branch --tree-filter 'echo "// modified" >> README.md' -- --all

The resulting temp repository can without conflict be pushed to an internal repository without risk of conflicts.

Upvotes: 1

Related Questions