JustinS
JustinS

Reputation: 13

How do developers handle large Git repositories with code dependencies not trackable by Git?

My company has a repository for a large CPP client-server application. We have an extra drive within our network containing the latest compiled projects for the entire app so every developer only has to build his own changes ( Building the entire application takes a few hours ).

This introduces a problem for developers who update ( pull ) their branches and don't update the network drive ( because they copied it to their local drive ). Or the other way around, developers who don't update their repository, but keep the drive up to date.

Explanation:

There are two important file type: Header and CPP files. When running the app Visual Studio first searched your local drive for the Header files and then the network drive and does the same for the compiled projects. So the problem is, that you always have all Header files on your local drive because you obviously have the entire project cloned, but you don't have all project compiled on you drive. This can create a mismatch between your compiled project and the Header files used for running the application.

Example:

Project 1: interface.h, implementation.h, implementation.cpp Project 2: dependency.cpp

So dependeny.cpp uses Project 1's interface.h . Now we have two developers and Developer 1 makes a change to interface.h and expands a struct and pushes it to the upstream repository. Developer 2 DOES NOT pull the changes but DOES get the latest compiled projects from the network drive. So now Visual Studio uses your OLD implementation of interface.h from your local drive AND uses the NEW compiled version of Project 1. This causes very confusing run time errors ( read-access violations ) because the structs that are passed to the methods have different sizes.

So basically: The precompiled projects have to always be compiled with the same Version ( Commit ) your changes are based on. ( Not always, since not all changes introduce changes to Header-Files, but generally it is always a good idea to have your local changes based on the Commit the network drive was compiled on ).

Now the actual question is:

We are switching to GitLab which forces users to use branches to receive a code review. Branches don't really like moving forward in history, but the compiled projects will be renewed every day. So the longer your branch exists, the higher the probability of you running into weird runtime errors. How could you prevent this?

My first ideas would have been:

I can't imagine that we are the only company with problems like this and there has to be some nice solution for this where developers don't have to rebase branches every day and --force push changes.

Upvotes: 1

Views: 417

Answers (3)

bk2204
bk2204

Reputation: 77024

This can be solved somewhat easily with make and some scripts. How you'll integrate that with Visual Studio is another question, which as a Unix user I'm not qualified to answer.

There's a three step process:

  1. Check out the commit which the build artifacts are built from.
  2. Copy the build artifacts from the network drive into place as if they were built on the local machine.
  3. Check out the commit you want to build and then build using make.

This works because make looks at timestamps when deciding what files have changed, and Git only checks out files that have changed. As a consequence, make will notice that the build products are newer than all but the changed files, so only the changed files and the files that depend on them need to be rebuilt.

It is also possible to solve this problem with a similar dependency-based build manager and a good hash, like SHA-256 (or, for speed, BLAKE2b): you can upload a manifest of the source files' hashes along with the build products, and then build only the files whose dependencies' hashes have changed. This prevents you from needing to check out an older version, but I'm not aware of any native tools that do this.

I've adopted the latter approach for a Perl-based build system and it worked quite well. Our goal was never to rebuild binaries unless it was required so that users could download the smallest possible partial updates, and it worked.

Anything you do to make the current system work is going to require custom tooling of some sort to build. This is easy with make or a set of shell scripts, but will be trickier on Windows.

Upvotes: 0

Klaus
Klaus

Reputation: 25663

So dependeny.cpp uses Project 1's interface.h . Now we have two developers and Developer 1 makes a change to interface.h and expands a struct and pushes it to the upstream repository. Developer 2 DOES NOT pull the changes but DOES get the latest compiled projects from the network drive. So now Visual Studio uses your OLD implementation of interface.h from your local drive AND uses the NEW compiled version of Project 1.

Congratulations! What are you doing? Sorry to say: Your workflow is fully broken and the general problem of reducing long build times must be fixed in general. For me it makes totally no sense to have only an over night build and many branches are not part of it.

My hint:

  • Use a more elaborated build tool chain! This means that you only compile this parts of code which are depending on the changes of your working branches. There is typically no need to do a full rebuild every time.

  • Use distributed build servers. Tools like distcc are doing a good job if a lot of build machines are available. I can't speak for MS environment, but I believe you can setup something like this also for MS.

  • you may combine this tool witch something like ccache tools. These tools are caching object files. If the hash of the source is unchanged, they do not really build again but returning the already available object files, even if needed for a different branch.

  • check out why your build times are so big. Often projects which are growing over time have more dependencies as needed. It is always valuable to spend some time to refactor the code base and find parts of the code which can be isolated. Create separate libraries / components which build fully alone or have a strict hierarchical dependency tree.

  • if you still have build times over hours, you should think about buying some simple PCs as compile slaves. If you set up 10 PCs for a cluster with distcc to get a full build and reduce the build time by factor lets say 9, you have more effect as providing some over night build binaries which is not manageable.

In my last company we had also large build times on very fat servers ( 24 CPU server, each one multicore CPU, massive parallel IO cards and so on. ). After setting up distcc for every developers PC ( ~200 PCs on site ) we reduce build time by factor 20! It did not scale down linear as you have always network latency, larger IO times on network as on local SSD and so on, but the effect is big enough. Starting with around 5 hours we were able to go to 15 minutes for a full build. By using ccache we "find" already compiled binaries and can do local builds in some seconds. But all this needs good build scripts and a goo maintenance and setup for branch management. It is not working out of the box!

As said, we did not setup a directory structure or some other repository related data storage to keep already build binaries. This job was simply done by caching binaries by ccache. On the other hand we keep local build binaries on a directory structure which reflects local branching. This can result in having multiple identical binaries in the build structure, but makes it easier to maintain the build scripts on developers PC.

There is no general ready to use concept. But I hope you can catch some of the ideas we had used successfully.

Upvotes: 2

LeGEC
LeGEC

Reputation: 52236

(not an answer, this is a formatted comment)

I didn't get all of your workflow :

  1. how does one updates "the changes he is based on" ?
  2. how does your build system manages to build incrementally ? does it inspect git diff master HEAD ?
  3. how many copies of the artifacts are stored on the external drive ? one single copy for "this night's build" ?

Upvotes: 0

Related Questions