Reputation: 7232
I would like to replay a git repo with some code reformatting and other code filters ... and yes I am aware of all the risks of doing so.
Unfortunately, this takes very long, it is impossible to freeze the work for so long. I know how I can replay a branch at some point.
What I am looking for is ideas how I can replay a branch from another repo and to have a resume
.
Essentially algorithm like this in pseudo code:
starting_sha = very_last
if resume {
starting_sha = last_applied_sha
}
for_each sha = commit --reversed from starting_sha to the HEAD {
git checkout sha
apply some changes to the code
git commit to target repo with metadata from sha
update last_applied_sha = sha
}
Obviously, I can easily implement such a script, but git commit to target repo with metadata from sha
is something that I wish I do not need to deal on my own.
I am hoping that there is some git filter-branch
type of functionality that will allow me to do so, without the need of dealing with tags and any other internals on my own.
Upvotes: 1
Views: 590
Reputation: 2145
1. Set up the target repository by cloning the source.
$ git clone <sourceRepo>
2. Check out the relevant branch. Replace branchname
by the actual branch name (also in all the following steps).
$ git checkout branchname
3. Do an initial rewrite using filter-branch
and a --tree-filter
, updating tags in the process with --tag-name-filter
. This is just an example filter that replaces the first occurrence of "text" with "modified" in all files matching the "*.txt" glob.
$ git filter-branch --tree-filter 'sed -i "s/text/modified/" *.txt' --tag-name-filter cat -- branchname
4. Create a tag to keep a record of the last source and target rev.
$ git tag lastsourcerev origin/branchname
$ git tag lasttargetrev branchname
Now whenever the time comes to update to new revisions from the source repo the following steps can be used. They only apply the tree-filter to the new commits and graft the new (rewritten) commits to the existing (previously rewritten) ones.
1. Fetch new commits/tags from the source repo:
$ git fetch origin
2. Reset to the new tip of the source branch.
$ git reset --hard origin/branchname
3. Apply filter-branch
with an extra --parent-filter
to graft the new commits to the existing ones. Note that we need the -f
(force) option as the previous filter-branch
command left refs/original
. The --parent-filter
makes use of the tags that stored the last source and target revs. The whole filter-branch
is limited to the commits between the last processed source rev and the newest source commit (that we reset branchname
to).
$ git filter-branch -f --tree-filter 'sed -i "s/text/modified/" *.txt' --tag-name-filter cat --parent-filter "sed s/$(git rev-parse lastsourcerev)/$(git rev-parse lasttargetrev)/g" -- lastsourcerev..branchname
4. Update the tracking tags to the new situation:
$ git tag -f lastsourcerev origin/branchname
$ git tag -f lasttargetrev branchname
Repeat these steps as needed. Once no more updates are to be done, the lastsourcerev
and lasttargetrev
helper tags can be deleted.
Note that the update process could be arbitrarily split into smaller increments by resetting the branch to some in-between commit from source and recording that commit as lastsourcerev
. Likewise the initial rewrite could be split up by creating a branch pointing at an in-between commit from source and recording that as lastsourcerev
and then applying the update steps to go further.
Note also that this process relies solely on filter-branch
to avoid any problems regarding tag rewrites or merge commits that rebasing newly incoming commits would otherwise inevitably cause.
Packaged as a shell script the incremental update part could look like this:
#!/bin/sh
REMOTE=origin
LOCAL_BRANCH=master
REMOTE_BRANCH=origin/master
SOURCE_REV_TAG=lastsourcerev
TARGET_REV_TAG=lasttargetrev
TREE_FILTER='sed -i "s/text/modified/" *.txt'
set -e
git fetch "$REMOTE"
if [ $(git rev-parse "$SOURCE_REV_TAG") = $(git rev-parse "$REMOTE_BRANCH") ]
then
echo "no new commits, nothing to do"
exit 0
fi
git checkout "$LOCAL_BRANCH"
git reset --hard "$REMOTE_BRANCH"
git filter-branch -f --tree-filter "$TREE_FILTER" \
--tag-name-filter cat \
--parent-filter "sed s/$(git rev-parse "$SOURCE_REV_TAG")/$(git rev-parse "$TARGET_REV_TAG")/g" \
-- "$SOURCE_REV_TAG"..
git tag -f "$SOURCE_REV_TAG" "$REMOTE_BRANCH"
git tag -f "$TARGET_REV_TAG"
The only edge case that comes up is when no new commits are available. In such a case the git reset --hard
would update the local branch to the remote branch, but then no filter step would be applied because no revs are to be rewritten. The script above handles that by checking if the source rev tracking tag points at the same commit as the remote branch.
Upvotes: 2
Reputation: 1323125
Rather than an interactive rebase, you could apply a git filter-branch
which would visit every commit of your repo and apply any utility (or code reformatting) you want.
Since the filter-branch is a local operation, there is no need for "another" repo: you apply it to a local clone of your repo.
Note that it does not support a pause/resume workflow, so you will need to let it process to completion.
See "Reformatting Your Codebase with git filter-branch" (by Elliot Chance) as an example:
git filter-branch --tree-filter 'phpcbf $(\
git show $GIT_COMMIT --name-status | egrep ^[AM] |\
grep .php | cut -f2)' -- --all
For each commit, that would look for added/modified files only, isolate the php ones and apply a formatting tool.
That does not prevent anyone to commit during this time.
Your collaborators will need to clone the new (formatted) repo, add their own as a remote, fetch, and rebase their own commits (only their new ones) on top of the (newly formatted) branch history of the new repo.
In other words, a reconciliation step is to be done by each collaborator, in order to integrate back the work done during the reformat stage.
If not, the process needs to be reversed, and your new repo must add the old one (where everybody has push to, assuming the recent commits are properly formatted) as a remote (named 'oldRepo
'):
cd /path/to/new/repo
git remote add oldRepo /path/to/old/central/repo
git fetch oldRepo
git branch --contains
)git rebase --onto abranch acommit~ oldRepo/abranch
That will replay all commits after the parent of the old commit detected on a branch 'oldRepo/abranch
' to the new repo abranch
(which is missing commits, since they were done and pushed while that new repo was being rewritten)
Upvotes: 0