memorableusername
memorableusername

Reputation: 502

Efficiently keeping a single git repository, while having multiple checked-out versions active concurrently?

TLDR:

I have a single git repo for an application with many branches for different in-progress version. I need to have multiple copies of this repo, which can be checked out (wrong term?) in different branches or commits, but without actually keeping multiple copies with likely redundant files.

Long version:

I am working on an application with a team attempting to improve the performance. This is a very empirical process, with different approaches are each implemented in their own branch, and lots of testing to choose what works well.

I am developing several approaches in their own branches, and also conducting performance and correctness testing of multiple versions of the application. To keep multiple builds around, I simply have a copy for each version, and build using that version, which is installed to a version specific location, with scripting that allows me to select which version to use in the current environment.

The correctness testing system disallows concurrent testing in the same source tree. So if a version itself has multiple configurations, they cannot be tested simultaneously. Because it can take sometimes take an hour to complete this test, I simply the source tree for each configuration, and as above, build and install in a separate location. This allows me to run multiple tests concurrently.

While this is not my ideal setup, my issue is that this is all done on a university HPC cluster, which has some pretty strict disk quotas. A user's /home directory can only contain 15Gb of data, but my have an unlimited number of files. A user's /extra directory (a system specific storage system) can have 200Gb of data, but may only have 600 files (which includes directories) per gigabyte of disk storage (in this case, 200*600, or 120000). Unfortunately, my setup excedes the disk quota on /home, as well as the file-count quota on /extra.

A friend suggested his process, which is to tar/untar directories as needed. This additional process sounds like another pain-point in an already tedious setup.

The core issue is that the git repository is somewhat large: ~650Mb on disk and +7500 files. I have about 17 of these repositories, along with different libraries, and lots of performance testing stuff (which is fairly heavyweight), which pushes this all over the quota limits. If I had a genie, I would ask it to give me a magical git repository that only pysically existed once. Each "copy" looks like and behaves like a normal copy of a git repository with branches that could be edited and committed to without mucking up any other branch. But in reality, that copy was a 'view' that was maintained by the system.

Questions:

Does something like my magic wish exist?

How do people with limited time and resources effectively manage this issue? (All this work quacks kinda like a build farm/CI testing thing.)

I know that the main answer is going to be "Just delete some of the repositories", which I will, but I really do work with quite a few of these at different times through the week, and having to play 'wack-a-repo' is going to be a pain.

Upvotes: 0

Views: 104

Answers (0)

Related Questions