Reputation: 684
Is it possible to know if a Git repository has submodules before we clone it, and if so what they are and what their URL is?
Also, can submodules have their own submodules, and if so, how many levels of hierarchy does Git permit?
Upvotes: 1
Views: 1276
Reputation: 488183
There isn't a perfect answer to this because the presence of submodules is tricky. ElpieKay's answer, to check for a .gitmodules
file, is a good method for getting close enough for all practical purposes. I'm just going to get into the weedy part here.
The problem here is that a submodule is represented in a commit by an entity that Git calls, internally, a gitlink. A gitlink holds a path name—what Git considers to be a file name, e.g., path/to/submodule
1—and the commit hash ID of some commit in some Git repository.2 That lets us dive a bit deeper into your question.
... and what their URL is?
This part is tricky because the initial URL for a submodule has to come from somewhere, but once the submodule has been git clone
-ed, Git no longer cares what the URL is. The submodule is now just a Git repository.
The .gitmodules
file supplies the initial information for a git clone
, but the .gitmodules
file is a file, and therefore there is a copy of it in every commit. The copies in each commit need not agree with each other. Perhaps the .gitmodules
file in commit X says to clone github.com/repo1 but the .gitmodules
file in commit Y says to clone github.com/repo2. Or, perhaps some commits lack a .gitmodules
file entirely.
One problem in particular is that Git will detect a subdirectory that has its own .git
repository in it, and when you're in the higher level Git repository, git add
will simply add that sub-repository as a gitlink entry. This does not create a .gitmodules
file. The end result is that people create repositories that "want" a submodule, but lack a .gitmodules
file, in any commit at all. When you clone one of these, you just don't get the sub-Git at all. If that's OK, well, then it's OK. 😀 You can see these gitlinks on some web browsers (such as GitHub's) as a little icon. There are a lot of different icons for this.
1Git filenames contain embedded slashes like this. This is not a folder named path
containing a folder named file
containing a gitlink, it's just a gitlink file named path/to/submodule
. The reason for this has to go with Git's index, which we won't go into here.
2With the way that SHA-256 is being introduced into Git, I foresee some problems coming up here. We may need a new kind of gitlink.
Upvotes: 2
Reputation: 30868
Hosting services like Github provide a git repository browser. In the browser we can see the files and folders of a specific commit. If a repository has one or more submodules, it definitely has .gitmodules
. The names, paths, and urls of submodules are listed in .gitmodules
.
Some hosting services (not including Github) also allow git archive --remote
to download specific files.
git archive --remote=<url_to_repository> --format=zip master .gitmodules | gunzip -
If .gitmodules
exists on master
of the repository, the command prints its content. We can check if submodules are listed in it.
A submodule can have its own submodules. The levels are infinitive in theory. A commit records its own submodules only and doesn't know or care if its submodule has any submodule. There could be errors if the checked-out path is too long, which limits the levels in practice.
Upvotes: 4