mrbolichi
mrbolichi

Reputation: 627

How do binary files work on git

There is this LaTeX project I'm managing with git, in which I have several branches and I use master as a branch where I get all the changes (at the end of the project it will be the final release). Sometimes when I compile my project under a branch, get the pdf and then when I merge that branch with master I get a merge conflict (between master's version of the pdf and branch's version of the pdf). Some other times, both versions merge seamlessly. What am I doing that causes one and another situation? How do I ensure that both versions merge without conflicts?

Upvotes: 1

Views: 4344

Answers (2)

torek
torek

Reputation: 487755

As crashmstr says in a comment, binary files won't merge at all. However, there's something you should understand about git merge: it doesn't always merge files. In fact, it doesn't ever really merge files, except as a side effect. It sometimes (not always) merges commits. When it does that, some of those sometimes require it to merge files.

As everyone else has also said so far in comments, "compiled" files (outputs of programs that work on the files that you do want to manage with a version-control system—the modern term for these seems to to be build artifact, though artifact has a more general definition) generally shouldn't be committed in Git.

What git merge branch does

When you run git merge, you:

  • are sitting on some commit, usually the tip of a branch (via git checkout branch-name): this commit is the one named by HEAD (try git rev-parse HEAD to see the hash ID, and git symbolic-ref HEAD to see how Git finds your current branch name from HEAD);
  • supply the name of another branch, or any other identifier that resolves to another commit (try git rev-parse branch-name to see how this works).

The merge command then runs a merge strategy (-s recursive, by default). There are some special strategies that do different things, but the default one takes your two commit hashes and grubs through the commit graph, also called the DAG for Directed Acyclic Graph, to find the merge base. You can view this graph with git log --graph or git log --all --decorate --oneline --graph, for which "A DOG" is a useful mnemonic, to remember the All Decorate Oneline Graph options. The merge base is, roughly speaking, "where the two lines in the graph, starting from your HEAD and other commits, first come together again."

We can draw this graph ourselves in a way that looks better on StackOverflow (actually there are lots of ways to draw it):

       C--D--E   <-- branch1
      /
...--B
      \
       F--G--H   <-- branch2

where each uppercase letter represents a commit. Here, the two tips of the two branches are commits E and H, and their merge base is commit B.

To merge (as a verb) commits E and H, Git essentially runs git diff B E (to see what changed in branch1 since the base commit) and then a second git diff B H (to see what changed in branch2). If there are changes to different files in these two lines, the merge result is easy: we just take whichever files changed in both lines, and all the unchanged files from the base B, and pile them together.

If E and H both have changes to one particular file, though, then git merge must combine (merge) those changes to that file. If the file is binary, Git will—at least by default—immediately give up and declare a conflict. This would be the case for your PDF file: if it's different in both E and H, vs B, Git will declare a merge conflict and make you resolve the file.

In any case, once all conflicts are resolved, git merge normally makes a new merge commit. This is a merge: merge as a noun. A merge commit is a commit with two parents, which we can draw as:

       C--D--E
      /       \
...--B         I
      \       /
       F--G--H

Note that I have left off the branch names this time. The new commit I is the same (in terms of committed files), regardless of which branch name we move to point to it. The branch name that moves, though, is the one we were on when we ran git merge. Hence if we were on branch1, the result is:

       C--D--E
      /       \
...--B         I   <-- branch1
      \       /
       F--G--H   <-- branch2

but if we were on branch2, the result is:

       C--D--E   <-- branch1
      /       \
...--B         I   <-- branch2
      \       /
       F--G--H

In other words, the new commit gets made in the usual way: whatever branch we're on now, that branch name is changed so that it points to the new commit. The new commit itself—commit I, in our case—points back to the previous commit, and for a merge commit, also points back to the other commit as well.

As a subtle but important point, the first parent of the new commit is the one that was the HEAD commit at the time. So while the contents of merge I don't depend on which branch we were on, the first parent does. If we use git log --first-parent, later, we'll follow only the first parent when looking at the history of commits. Since that's the branch we were on, that means we'll go back to either E or to H as appropriate.

When git merge doesn't merge

The drawings above deliberately cover only one of four possible cases.

Suppose that instead of:

       C   <-- branch1
      /
...--B
      \
       D   <-- branch2

or the like, we have:

       C   <-- branch1 (HEAD)
      /
...--B    <-- branch2

Now the merge base commit B is the tip commit of branch2. We're on branch1—that's why it's marked (HEAD)—but there is nothing from branch2 to merge. In this case, git merge says "already up to date" and does nothing.


Or, suppose we have this instead:

       C   <-- branch2
      /
...--B    <-- branch1 (HEAD)

In this case, the merge base of branch1 and branch2 is commit B, again, but branch2 is ahead of branch 1. Git can, and by default will, skip the merge and do what it calls a fast-forward instead. It will change the name branch1 so that it points directly to commit C, and check out commit C, giving:

       C   <-- branch2, branch1 (HEAD)
      /
...--B

This "fast forward merge" (which is not a merge at all) happens very often when you are sharing an "upstream" repository (such as one on GitHub) with others who also work and push there. If one of you does some work and pushes, and the other has made no new commits and does a fetch-and-merge, Git sees that the new commits obtained from upstream are "fast-forward-able" and does this instead of doing a true merge.

You can defeat this with git merge --no-ff. Some workflows call for that.


There is one last possible case, but it's pretty rare: there may be no merge base at all. This happens if you combine two separate repositories, or use git checkout --orphan to start a new independent commit sub-graph. Here we might draw the entire graph as:

A--B--...--G--H   <-- branch1 (HEAD)

I--J--...--O--P   <-- branch2

If you ask Git to merge commits H and P, the result depends on your Git version. Older versions of Git try to merge these two graphs using Git's semi-secret empty tree as a base tree, which may or may not work depending on the contents of H and P. Since Git version 2.9.0, however, Git has started rejecting these by default, requiring --allow-unrelated-histories. (If you supply that flag, the merge goes ahead as before, using the empty tree as the base.)

Upvotes: 6

Roland Smith
Roland Smith

Reputation: 43495

It is generally considered good practice that anything that can be built from sources is not put under revision control. That is, it should be listed in a .gitigore file.

There are several reasons for this;

  1. It generates a lot of extra data (that can easily be reproduced) to store in the repo.
  2. You might get merge conflicts on binary files as you have discovered. Binaries usually cannot be merged in a meaningful way. You can, however choose one of them to replace the other. See the ours or theirs merge strategies.
  3. If the sources are also merged, you'd have to create a new binary afterwards anyway. Otherwise the binary is inconsistent with the source.

For LaTeX repositories, my .gitignore contains at least:

*.aux
*.bbl
*.blg
*.fdb_latexmk
*.fls
*.idx
*.ilg
*.ind
*.lof
*.log
*.lot
*.out
*.toc

(I'm using latexmk for building LaTeX documents.)

Upvotes: 5

Related Questions