Pratyush Das
Pratyush Das

Reputation: 514

How to git clone a subdirectory of a particular branch of a remote repository?

Caveat - Should not require configuring git (if the required options are changed to the correct enabled/disabled as default in recent git versions and would only require configuring for older versions, then it should be fine).

The git repository is a remote repository hosted on Github.

Cloning a single branch seems easy with --branch <branchname> --depth 1 or --branch <branchname> --single-branch.

Cloning a subdirectory of a repository is a little bit tricky. I found several StackOverflow questions asking how to do this -

  1. How do I clone a subdirectory only of a Git repository?
  2. Cloning only a subdirectory with git
  3. clone parts of a github project

but the only answers that did not involve jumping through hoops, had some downside for my use case -

  1. https://stackoverflow.com/a/52269934/4647107 - I think the file has to be a local repo and won't work with a remote repo.
  2. https://stackoverflow.com/a/37735157/4647107 - the downside is that although it lets me clone a subdirectory of a remote repo, it makes me work with master instead of a branch of my choice.

As an aside, I would prefer to not have to clone all of the history and branches since it is a huge repo and keep the --depth 1 or -r HEAD or an alternate command that would not clone everything. But if it is not possible, that is fine.

Upvotes: 3

Views: 7954

Answers (1)

torek
torek

Reputation: 487805

You literally cannot do what you want: git clone will not clone a subdirectory.

You might be able to do, quite easily, what you need. This depends on what kind of access you have to the Git repository on the remote machine.

As chepner commented, Git stores commits. The commits themselves form chains, which humans tend to call branches. Any given chain ends at its most recent commit, which Git calls the tip commit of a branch, and a branch name simply identifies this most-recent commit, by its hash ID. (Every commit has a unique hash ID.)

Now, each commit, no matter where it is in a repository, contains every file. The files inside each commit are stored in a special, read-only, Git-only form, with compression applied and various tricks so that re-committing the same file takes no extra space. Only Git itself can read this special format.1

More precisely, a commit contains every file that should be in that commit, such that if you have the repository as a whole, you can then tell Git: get me commit a123456... (by its hash ID), and Git will extract the commit to a work area. In the work area, you will now have ordinary files in their ordinary everyday file format, which you can work with. (That's why we call it your work area, or working tree or work-tree.)

Note, however, that this means that each commit is also an archive. Git distributions include the git archive command, which turns a Git-specific archive of files into one of two standard non-Git archive formats: zip, or tar. (More formats may be added in the future, but these two have been in Git for pretty much as long as git archive has existed.)

This means that anyone who has a clone of the repository can turn any commit into one of these archives. You can then use the archiver itself—unzip or tar—to extract just the desired subset of the files in this one archive.

If the site that has the Git repository lets you run arbitrary commands, go there and make an archive from the commit(s) whose file(s) you want, then manipulate the archive there. If the site is GitHub or similar, note that they offer an interface to obtain a tar or zip archive from any given commit, and use that to copy this archive to your machine and manipulate that archive there.

The git repository is a remote repository hosted on Github.

So, you can grab a zip archive of any commit using any web browser: navigate to the desired commit, click "clone or download", select "zip", and the browser should save the resulting zip file somewhere.

(To automate this, note the URL from which the zip file gets downloaded. It will probably have a commit hash ID embedded in it. Use a program like curl to do your own downloading without firing up a browser.)

Note that these archives are not Git repositories and you cannot do any Git work with them. If you only plan to extract a subset of the files, though, the result won't be usable against any clone of the repository anyway—at least, not if done like this.2

If neither of these is available, you can use git clone -b <branch-name> --single-branch --depth 1 to make a shallow clone that has depth 1—i.e., just the one commit—that copies only the one commit identified by the named branch. Now you have a very limited clone, that has just one commit in it, so now you can run git archive on the one commit, if you like. Of course, at this point, you can just git checkout the one commit, then move the entire desired subdirectory out of the work-tree and then remove the Git repository.

No matter what you do, you cannot put this stuff back into the original Git repository without having a real—possibly shallow—clone of the original Git repository. Your question never mentions what you plan to do with this subdirectory that you have extracted from one particular commit.


1The format is public, so anyone can write a program to read it. A program to read it is basically Git, though, so you might as well just use Git—especially because Git reserves the right to have future formats, and if you write your own version of Git and Git adds a new format next year, that works better for some cases, you'll have to update your own program too.

2git subtree provides tools for doing something very different in mechanism, but similar in terms of some of its goals. Subtrees that have been split can be re-incorporated later as long as you follow a lot of very particular rules. To use git subtree, though, you would need a full clone, which you've already rejected as an option. The rules themselves are also fiddly ("jumping through hoops"), which you find undesirable.

Upvotes: 1

Related Questions