gyuunyuu
gyuunyuu

Reputation: 684

How to know if a Git repository has submodules before it is cloned?

Is it possible to know if a Git repository has submodules before we clone it, and if so what they are and what their URL is?

Also, can submodules have their own submodules, and if so, how many levels of hierarchy does Git permit?

Upvotes: 1

Views: 1276

Answers (2)

torek
torek

Reputation: 488183

There isn't a perfect answer to this because the presence of submodules is tricky. ElpieKay's answer, to check for a .gitmodules file, is a good method for getting close enough for all practical purposes. I'm just going to get into the weedy part here.

The problem here is that a submodule is represented in a commit by an entity that Git calls, internally, a gitlink. A gitlink holds a path name—what Git considers to be a file name, e.g., path/to/submodule1—and the commit hash ID of some commit in some Git repository.2 That lets us dive a bit deeper into your question.

... and what their URL is?

This part is tricky because the initial URL for a submodule has to come from somewhere, but once the submodule has been git clone-ed, Git no longer cares what the URL is. The submodule is now just a Git repository.

The .gitmodules file supplies the initial information for a git clone, but the .gitmodules file is a file, and therefore there is a copy of it in every commit. The copies in each commit need not agree with each other. Perhaps the .gitmodules file in commit X says to clone github.com/repo1 but the .gitmodules file in commit Y says to clone github.com/repo2. Or, perhaps some commits lack a .gitmodules file entirely.

One problem in particular is that Git will detect a subdirectory that has its own .git repository in it, and when you're in the higher level Git repository, git add will simply add that sub-repository as a gitlink entry. This does not create a .gitmodules file. The end result is that people create repositories that "want" a submodule, but lack a .gitmodules file, in any commit at all. When you clone one of these, you just don't get the sub-Git at all. If that's OK, well, then it's OK. 😀 You can see these gitlinks on some web browsers (such as GitHub's) as a little icon. There are a lot of different icons for this.


1Git filenames contain embedded slashes like this. This is not a folder named path containing a folder named file containing a gitlink, it's just a gitlink file named path/to/submodule. The reason for this has to go with Git's index, which we won't go into here.

2With the way that SHA-256 is being introduced into Git, I foresee some problems coming up here. We may need a new kind of gitlink.

Upvotes: 2

ElpieKay
ElpieKay

Reputation: 30868

Hosting services like Github provide a git repository browser. In the browser we can see the files and folders of a specific commit. If a repository has one or more submodules, it definitely has .gitmodules. The names, paths, and urls of submodules are listed in .gitmodules.

Some hosting services (not including Github) also allow git archive --remote to download specific files.

git archive --remote=<url_to_repository> --format=zip master .gitmodules | gunzip -

If .gitmodules exists on master of the repository, the command prints its content. We can check if submodules are listed in it.

A submodule can have its own submodules. The levels are infinitive in theory. A commit records its own submodules only and doesn't know or care if its submodule has any submodule. There could be errors if the checked-out path is too long, which limits the levels in practice.

Upvotes: 4

Related Questions