Archit Arora
Archit Arora

Reputation: 2636

Avoid files from being overwritten during git merge

Consider the following scenario. I have three branches - Master, Develop and Test. All three branches have a config file (say a JenkinsFile) that contains branch specific configuration. Please note that the configuration in this file is different for all three branches. Now, I create a feature branch off Master, make some changes and merge this feature branch with Develop and then with Test.

The question is - how do I prevent the JenkinsFile from being overwritten by any merge? I want the JenkinsFile to remain intact and not be affected by any merge. Is there a way to "lock" these files? Does gitignore work in this case?

Cheers!

Upvotes: 1

Views: 2757

Answers (2)

torek
torek

Reputation: 490058

The question is - how do I prevent the JenkinsFile from being overwritten by any merge? I want the JenkinsFile to remain intact and not be affected by any merge. Is there a way to "lock" these files?

No.

There is a completely different way to go about this, though, that sidesteps the entire problem. In fact, there are multiple ways, but I'll show just one. There's an unfortunate problem in terms of getting to the state where things all work as desired, but once you do get there, you're good. The end goal here is to not have a committed file named Jenkinsfile (or JenkinsFile, but I've used the lowercase-F spelling below) whose content is branch-dependent. Instead, just have an uncommitted work-tree-only file whose name is Jenkins[Ff]ile and whose content is branch-dependent. Make the committed files have other names.

Background

Fundamentally, git merge works by combining work done, i.e., combining the changes to some file(s) since some common starting point. But Git doesn't store changes; Git stores snapshots. This creates a problem for git merge, and the solution requires that you understand how Git's commit graph works.

Almost every commit in a Git repository has at least one parent commit, which is that commit's immediate predecessor. Most have exactly one parent; commits of type "merge" have at least two, and usually exactly two. In fact, the presence of more than one parent is what defines a commit to be a merge commit. The other common special case is that the very first commit in a repository has no parent, because it can't have one, because it was the first commit. (Commits with three or more parents are called octopus merges but they do nothing you can't do with regular merges, so they're mainly for showing off. :-) )

These links, in which a commit stores the hash ID of its parent(s)—remember that each commit is found by its unique hash ID, that Git assigned to the commit when you made the commit—form backwards chains. These backwards chains are the history in the repository. History is commits; commits are history. A branch name simply identifies the (single) last commit that we wish to claim to be part of that branch:

... <-F <-G <-H   <--master

Here, instead of actual hash IDs, I've drawn in single uppercase letters that stand in for each commit. The name master holds the actual hash ID of commit H. We say that master points to H. H holds the hash ID of its parent G, so H points to G, which points to F, and so on, backwards down the line.

Nothing inside any commit can ever change, so we don't need the internal arrows, we just have to remember that they go backwards. It's actually very hard to go forwards, in Git: almost all operations start at the end(s) and work backwards. Once we have more than one branch, this gives is a picture that looks like this:

          G--H   <-- master
         /
...--E--F
         \
          I--J   <-- develop
              \
               K   <-- test

To git checkout a branch means *extract the snapshot from the tip commit of that branch. Sogit checkout masterextracts the snapshot from commitH, whilegit checkout developorgit checkout testextracts those snapshots in turn. Also, doing agit checkoutof some branch name attaches the special nameHEAD` to that branch. This is how Git knows which branch—and commit—is the current one.

When you run git merge, you give Git the name of some other commit. That doesn't have to be a branch name—any name for a commit will serve—but giving it a branch name works fine, since that names the tip commit of that branch. So if you git checkout master and then run git merge develop, you start with:

          G--H   <-- master (HEAD)
         /
...--E--F
         \
          I--J   <-- develop
              \
               K   <-- test

and Git finds commit J. Git then works backwards from both the current commit H and the named commit J to find the merge base of these two commits.

The merge base is, loosely, the first commit we get to from both tips. That's a commit that's on both branches, and in this case, that's obviously commit F. The idea of a merge base is crucial to understanding how merge works. Since the goal of the merge is to combine work, and that work can be found by comparing the snapshot in commit F, one comparison at a time, to each of the two tip commits H and J:

git diff --find-renames <hash-of-F> <hash-of-H>    # what we changed
git diff --find-renames <hash-of-F> <hash-of-J>    # what they changed

To combine the changes, Git starts with all the files from F, and looks at which files we changed and which ones they changed. If we both changed different files, Git takes ours or theirs as appropriate. If we both changed the same file—this eventually brings up a philosophical problem which we'll get back to in a moment—Git attempts to smash our changes together with their changes, by assuming that if we touched some source line and they didn't, it should take ours, and if they touched some source line and we didn't, it should take theirs too. If we both touched the same lines of the same file, then either we did the exact same thing to those lines—in which case, Git takes one copy of that change—or there's a conflict.

If there are no conflicts, Git applies these combined changes to the snapshot in the merge base—in F, here—and uses the resulting files to write out a new snapshot. That new snapshot is a commit of type merge commit, having two parents. The first parent is the commit we were on before, H, and the second is the one we named with our argument, J, so the merge looks like this:

          G--H
         /    \
...--E--F      L   <-- master (HEAD)
         \    /
          I--J   <-- develop
              \
               K   <-- test

Note that nothing happens to any existing commit, nor to any other branch name. Only our own branch name, master (to which HEAD is attached), moves; master now points to the new merge commit that Git just made.

If the merge goes badly, due to merge conflicts, Git will leave a mess behind. The index, which I'm not going to get into here, will contain all the conflicting input files, and the work-tree will contain Git's attempt at merge, along with conflict markers. Your job is to clean up the mess, fix up the index, and finish the merge (with git merge --continue or git commit—the --continue just runs commit) by hand.

Your problem: Jenkinsfile

Suppose that in commit F, the merge base, there is a file named Jenkinsfile. This same file, with this same name, appears in commits H and J. The copies in H and J differ—you said they do, so we'll assume that they do. Therefore at least one differs from F, and perhaps both differ from F.

Git is going to assume that the file that is named Jenkinsfile in both branch tips is the same file that is named Jenkinsfile in F. Obviously, it's not quite the same file—the contents differ—but Git will assume that it is, and that you're trying to combine work done on it.

So, Git will diff the version of Jenkinsfile in F against that in H, and then diff it again, against the version in J. There will be some changes. If both branch tips have changes, Git will combine them (or declare a conflict). Result: bad. Otherwise, Git will take the version of the file from whichever "side" changed it. Is that the side you want? If so, result: good. If not, result: bad.

In summary, for this scenario, there are three possible results:

  • Base vs HEAD is the only change: the result is fine.
  • Base vs theirs is the only change: result is bad.
  • Base vs HEAD and base vs theirs both have changes: result is probably bad.

It is of course possible that merge base commit F has no file named Jenkinsfile. And, it's possible that one or both commit has no such file. In this case, it gets a little trickier. We'll get to that in a moment.

The solution (and some issues getting there)

The solution here is to avoid having a single, fixed-name file, such as Jenkinsfile, in all commits when that file is intended to be branch-dependent. Suppose, instead, that commit F contains Jenkinsfile.master and Jenkinsfile.develop and Jenkinsfile.test. Then commit H will have a Jenkinsfile.master and Jenkinsfile.develop and Jenkinsfile.test too, and the changes from F to H in Jenkinsfile.master will be the ones you want to keep. Since commit J is in branch develop, it should always either have the same changes—imported from master at some point—or no changes at all. Git's merge will therefore do the right thing, in both cases.

The same logic applies to each of the other such files. Note that at this point, the commits identified by all branch tips should have no file named Jenkinsfile (without a suffix) at all. This is, of course, an idealized goal-state: to get there, you must actually make new commits in each branch, renaming the existing Jenkinsfile. But this will have no effect at all on any existing commits. All of that history in your repository is frozen for all time. This means that at some point, you'll run git merge and git merge will locate a merge base commit that has only Jenkinsfile, not Jenkinsfile.master, and not Jenkinsfile.develop or any other suffix.

Let's assume now that in H and J, you have already done this renaming, but in merge base F, you have not—obviously, since it's a historic commit. So F has a Jenkinsfile and no renamed files, while H and J have no Jenkinsfile but do have the renamed files.

Now, remember above where we showed the git diffs that git merge runs, to figure out what has changed since the merge base. One of the arguments is --find-renames. This directs Git to guess whether the file Jenkinsfile in F is "the same" file as Jenkinsfile.master in H, when comparing F and H. The same goes for the comparison of F vs J: is the old Jenkinsfile the same file as the new Jenkinsfile.develop?

If you followed the link to https://en.wikipedia.org/wiki/Ship_of_Theseus you will see that there's no philosophical right answer to the question of identity-over-time. But Git has its right answer, which is: If the file has a similarity index of 50% or better, it's the same file. We don't need to worry here about how Git computes this similarity index (it's a bit complicated); chances are very good that Git will detect the rename in both cases.

What this means in practice is that the first time you run this git merge, Git will immediately declare a merge conflict, of the type I like to call a high level conflict. That is, Git will say that Jenkinsfile was renamed in both branches, but to two different names. Git doesn't know whether to use the master version, or the develop version, or both, or neither, or what. It will just stop with a merge conflict. This is OK because it gives you a chance to resolve the conflict, which you should do by selecting the Jenkinsfile.master file as it appears in the master or --ours branch, and selecting the Jenkinsfile.develop file as it appears in the develop or --theirs branch, as your merged results. Put these two files into the index while removing the original name:

git rm --cached Jenkinsfile
git checkout --ours Jenkinsfile.master
git checkout --theirs Jenkinsfile.develop
git add Jenkinsfile.master Jenkinsfile.develop

You have now resolved the conflict by choosing to keep both files as they appear in both branch tips. You can now commit the result.

Every time you do a merge that uses one of the historic, single-Jenkinsfile commits, you'll need to check that the merge result is correct, or resolve any conflicts. (If it's not correct, immediately after merging, you can fix it in place and use git commit --amend to push the original merge aside and choose a new result as the merge commit. If you don't notice a bad merge, it's a bit more painful, but the recovery is similar in the end anyway. Remember how Git does merges, and work through the two git diffs, to see how putting the right result in any tip commit gets you where you need to go.)

Last, now there's no Jenkinsfile

Now that there's no file named Jenkinsfile, you'll have to redirect any software that wants to use such a file. There are multiple solutions (depending on the software and your OS), including making a symbolic link from Jenkinsfile to the correct per-branch checkout. (Make sure the symbolic link does not get committed, or you'll be right back to the same merge issue when Git tries to merge two potential symlink target changes.)

Upvotes: 3

tmaj
tmaj

Reputation: 35135

Keeping a file different between branches is super hard, especially one that changes from time to time.

A better solution is to have settings.environment.json files that contain settings for different environments and make your software use different settings file depending where it runs.

Having said that it's best not to keep your production settings in git. The deployment pipeline should contain passwords etc, not your version control system. In this scenario the settings file in all branches contains DEV settings (which are OK to be public) and the pipeline overwrites the settings with TEST and PROD values when it prepares the package for deployment to the target environment.

Upvotes: 1

Related Questions