I read numerous articles on this topic, but I'm still not sure how to proceed. I have an application that has grown over the past 15 years; until now, the source code has been managed using another source control system. I'm planning to migrate to Git and intend to use a branching model like the one described here . These branches need to be migrated from our current system to Git: Current system | Git --------------------------- Dev | master v1 | release/v1 v1 is a legacy release that needs to be maintained for now but will be deprecated at some point. It is very different from Dev and will never be merged entirely. What I would like to do is: Initialize the repository Check in Dev code as master Switch to branch release/v1 and replace all files with the current v1 files, then push them to the server Now comes the part that's unclear to me: I would like to be able to merge specific changes from release/v1 to master in the future for parts of the application that are still similar enough, i.e. when fixing a bug in v1 . To be able to do that, I need to do an initial upstream merge from release/v1 to master where all differences are ignored, so master remains as it is and only changes after that point are taken into consideration. Would git merge -Xtheirs be the way to go in this situation? Thanks, Jan Edit: I think I may have found my solution - not sure if it's elegant, but it seems to result in the correct state: Create repo and check in files for master Branch release/v1 Replace contents of release/v1 with files from previous SCM Checkout master merge with strategy ours : git merge -s ours release/v1

Reputation: 13

Git: Working with two divergent branches

I read numerous articles on this topic, but I'm still not sure how to proceed.

I have an application that has grown over the past 15 years; until now, the source code has been managed using another source control system. I'm planning to migrate to Git and intend to use a branching model like the one described here.

These branches need to be migrated from our current system to Git:

Current system | Git
---------------------------
Dev            | master
v1             | release/v1

v1 is a legacy release that needs to be maintained for now but will be deprecated at some point. It is very different from Dev and will never be merged entirely.

What I would like to do is:

Initialize the repository
Check in Dev code as master
Switch to branch release/v1 and replace all files with the current v1 files, then push them to the server

Now comes the part that's unclear to me: I would like to be able to merge specific changes from release/v1 to master in the future for parts of the application that are still similar enough, i.e. when fixing a bug in v1. To be able to do that, I need to do an initial upstream merge from release/v1 to master where all differences are ignored, so master remains as it is and only changes after that point are taken into consideration.

Would git merge -Xtheirs be the way to go in this situation?

Thanks, Jan

Edit:

I think I may have found my solution - not sure if it's elegant, but it seems to result in the correct state:

Create repo and check in files for master
Branch release/v1
Replace contents of release/v1 with files from previous SCM
Checkout master
merge with strategy ours: git merge -s ours release/v1

Upvotes: 0

Answers (2)

torek

Reputation: 489848

Consider creating your Git repository differently. Start by creating the existing v1 legacy version as the first commit ever. The very first commit in any Git repository has an interesting property.

Let's start with how branches grow, on the assumption that you have an existing, working repository. Imagine a small repository with only one named branch—master—and three commits. The actual names of commits are big ugly hash IDs, but we'll work here with single uppercase letters. (Our imaginary repository can only hold 26 commits!)

Let's draw the commits the way Git has them, too. Each commit remembers, as its parent, the hash ID of the previous commit. So since we have three commits, with the third one being commit C, commit C remembers commit B as its parent:

  <-B <-C

Of course B also has a parent, and it remembers that A is its parent:

A <-B <-C

But what is the parent commit for commit A? The answer is: there is none. It can't have a parent. It's the first commit! It is parent-less. It is sui generis. Git calls it a root commit.

The root commit is where all action starts—but Git works backwards, so it's really where all action ends. The way Git remembers the hash ID of commit C is to store it in the name master. Real hash IDs are big, ugly, and seemingly-random: there's no good way to remember them other than to just write them down, so Git writes them down in branch names like master.

Commits themselves are permanent, read-only, and incorruptible.¹ Once you've made a commit, you can never change it. So the hash IDs wired into each commit as the parent are unchangeable, and we don't really need to draw the arrows as arrows. The hash IDs in branch names, though, are highly changeable! So let's keep drawing those as arrows:

A--B--C   <-- master

We say that the name master points to commit C, and because C records B's hash, C points to B. B points to A, and A, as the root commit, points nowhere.

Now if we decide to add a new commit, we start with the following:

$ git checkout master

This attaches our HEAD to the name master. Then we fuss with files and git add and git commit to make a new commit D. D's parent is C:

A--B--C   <-- master (HEAD)
       \
        D

The last (well, almost last) step of git commit is that Git writes D's hash ID into the name to which HEAD is attached. So master now points to D, not C:

A--B--C
       \
        D   <-- master (HEAD)

and we've thus created a new commit and added it to our master branch.

¹The "forever" part is only mostly-true: a commit that cannot be reached by some external name, such as a branch name, is eventually garbage-collected and deleted. The rest is guaranteed: you cannot change any existing Git object, and Git will detect (and complain about) any corruption when it notices that the name of the object no longer matches the cryptographic checksum of the object. (The names are the cryptographic checksums, so if they do not match, corruption has occurred.)

How this applies to your situation

So, suppose we make the very first commit—our "commit A"—out of your v1 legacy version. I'll omit all the Git commands, as you probably have those down:

A   <-- master (HEAD)

Now let's create a new name, say, branch-v1, that *also points to commit A. We do this with the simple command:

$ git branch branch-v1

which gives us this:

A   <-- master (HEAD), branch-v1

Now we remove every file in the work-tree and the index:

$ git rm -r .

and copy in all the files from the development system, e.g.:

$ ssh development-system 'cd some/path; tar cf - .' | tar xf -

and then git add them all and git commit:

$ git add .
$ git commit -m 'import development version'

This makes our new commit B, and changes the name master to point to B. B's parent is A. Let's draw it:

A   <-- branch-v1
 \
  B   <-- master (HEAD)

There's something very important about this graph drawing. Commit A is on branch branch-v1, but it's also on master. In Git, commits can be on more than one branch at a time.

Time marches on

Suppose it's now the next day (week, whatever) and there is a patch to v1. You can now, on any Git repository that's a clone of this one (or on this one if you like), check out the branch-v1 branch:

A   <-- branch-v1 (HEAD)
 \
  B--...--C   <-- master

We've made more commits on the main line (and maybe it's all fancied up with multiple branch names and so on, but commits A and B are certainly still in, shaped like this). So now let's add and commit the updated v1 code:

A--D   <-- branch-v1 (HEAD)
 \
  B--...--C   <-- master

If we wish to take up the changes between snapshots A and D, it's now mostly-trivial (depending on just how radically different C is from A) to do so, using git merge. Running:

$ git checkout master
$ git merge branch-v1

will tell Git to find the latest shared / common commit between the two branches master (to which we just attached our HEAD) and branch-v1. So Git will search the history, following C back to B back to A, and following D to A. Commit A is on both branches. In fact, it's the latest such commit (the notion of latest here is rather wooly but I suspect you know what it means), so commit A is the merge base of commits C and D.

Git will therefore compare A vs D, to see what changed on branch-v1 since we were last in sync there. It will compare A vs C, to see what we changed on master, too. Git will then combine these two sets of changes and make a new merge commit, on our current branch master (remember, our HEAD is attached to master now):

A----------D   <-- branch-v1
 \          \
  B--...--C--E   <-- master (HEAD)

The merge commit links back to the previous HEAD, i.e., C, as its first parent, and to the commit we merged (D) as its second parent. (This parent numbering is important later, if you want it to be.)

If yet more change are made to v1 we can incorporate them with another git merge. We start with this, having picked up commit F somehow:

A----------D--F   <-- branch-v1
 \          \
  B--...--C--E   <-- master (HEAD)

and again run git merge branch-v1, which finds the nearest common commit: it looks at F which leads to D; it looks at E which leads to C and D both; and there we have our common commit: it's now D. Git compares D vs F to see what they did, compares D vs E to see what we did, combines these changes, applies them to D's source, and makes new merge commit G:

A----------D--F   <-- branch-v1
 \          \  \
  B--...--C--E--G   <-- master (HEAD)

Your branch flow model complicates all this up (for various reasons, mostly :-) good ones) but the fundamentals remain the same.

The reason we put v1 in as the first commit was to make sure that commit A, our root commit, which contains the source of v1, is on all branches. That way anything that happens later can be compared to v1.

Tagging

You can find a root commit in a repository by doing graph traversal: when there's no more graph to traverse, you have reached a root commit. As long as the graph has only one root (the normal case), that's the root. But Git does let you create more root commits, and if you merge independent repositories, you will get extra roots that way as well. In any case, walking the graph to find the root is a pain—so Git offers tags as well as branch names.

A branch name, as we saw, simply identifies one particular commit—but the name moves over time, as we add more commits to the branch. A tag name does the same thing as a branch name, but unlike branch names, tag names don't move. So once we've created the root commit on the v1-branch branch, we can tag that commit:

$ git tag v1.0

to make the string v1.0 mean that particular commit.

Tags can be simple (a "lightweight tag") and just identify a commit directly, or fancied-up (an "annotated tag"). The fancy one carries extra data, and then identifies the commit. The use of either tag is basically the same—v1.0 is another name for the raw hash ID—so choose an annotated tag if you want to add the extra data that an annotated tag brings.

(There's another difference when merging against an annotated tag, but I'll skip that for now.)

Upvotes: 1

Kevin Hencke

Reputation: 168

Porting history from another SCM system is cumbersome. Luckily, since you are starting with two fixed and well-understood points and want to draw a deliberate and gradual path between them, there is an easier way than working with merge commands.

Start by establishing the branches master and v1 as you describe. Then, bring changes from v1 into master one file at a time.

Given src/someFile.cpp present in both master and release/v1, start by checking out master and then ask Git to bring in the other branch's version.

Execute git checkout release/v1 -- src/someFile.cpp from the root of the repository. The syntax git checkout branch1 -- fileA tells git to bring fileA into the current branch from branch1, overwriting any existing file by the same name.

History is preserved as usual; each time you do this inter-branch checkout command, it stages a change which must then be committed. You may do this for multiple files. then commit them all at once.