Can I return to previous commit in a single submodule?

Question

I would like to have my application set up into some services, but all in one repository. So I wanted to add one submodule for each service (I am only having two for the moment). So my project hierarchy is:

- root
|--rootDoc.txt
|--.git
|
|---- sub1
    |--sub1.txt
    |--.git
|---- sub2
    |--sub2.txt
    |--.git

Now I made the following changes:

change sub1.txt
commit sub1 submodule
push main to master
change sub2.txt
commit sub2 submodule
push main to master

Now I'd like to return sub1-submodule to the state before the last changes in it but keep sub2 in its current state. If that is not possible for submodules, is there another solution for my problem or would I need to use two completely different repositories?

Edit: What I tried:

c:\dev\root\sub1>git log
commit a172db9a5f11738383d28e082db2c22d3f2d3e75 (HEAD -> master, origin/master, origin/HEAD)
Author: %me%
Date:   Sun Dec 2 20:24:59 2018 +0100

    updated sub2

commit 0becb718a4db9c73b03fa65e332f20c7715463cb
Author: %me%
Date:   Sun Dec 2 20:23:40 2018 +0100

    sub1 actual now

commit 85d68703bff1af2b95a7ee8d7926d7fd700b1da4
Author: %me%
Date:   Sun Dec 2 20:10:50 2018 +0100

    Added submodules

commit b3b67de3e54f1db7e56d516af2baaf50541f7ca2
Author: %me%
Date:   Sun Dec 2 20:05:44 2018 +0100

    initial commit

c:\dev\root\sub1>git checkout 85d68703bff1af2b95a7ee8d7926d7fd700b1da4
Note: checking out '85d68703bff1af2b95a7ee8d7926d7fd700b1da4'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b 

HEAD is now at 85d6870 Added submodules

After this checkout my sub2 is also changed although I checked out from the sub1-dir (where the other submodule is located).

torek · Accepted Answer

You can do what's in your title question ("return to previous commit in a single submodule"). Every submodule is an independent repository in its own right. What's not clear is what you actually done. I suspect that you have made one repository with several sub-directories, and perhaps another two repositories that live under the one repository but are not submodules.

It's worth stepping back here and defining some terms. I'm not really thrilled with Git's terminology here ("submodule" and "superproject" are kind of clumsy) but I will stick with them.

A submodule is a Git repository.
A superproject is a Git repository.

Obviously this is not much help, 😀 so let's add some qualifiers:

A submodule is a Git repository that is currently being used by another Git repository which we call the superproject. There is exactly one superproject for this submodule Git repo.
A superproject is a Git repository that is currently using another Git repository as a submodule. There may be multiple submodules within this superproject.

(This leads to the possibility that some Git repository is simultaneously a submodule and a superproject. This is a bit of a nightmare and you should try to avoid it, but it does happen.)

Now, when a superproject makes demands on another Git repository that the superproject is using as a submodule, the way the superproject Git does this is—at least normally—to command the submodule Git to enter detached HEAD mode. Any Git repository can be in this state, but most normal repositories aren't, except when you're in the middle of a long rebase, or are using git checkout commit-or-tag to move to a specific historic commit. Normally, when developing, you're on a branch like master or develop, which is the opposite of "detached HEAD": here the name HEAD is figuratively attached to the branch name. So git checkout master attaches your HEAD to master, and git checkout develop attaches your HEAD to develop.

(HEAD, written in all-capitals like this, always—always—refers to the current commit in the current Git repository. The underlying implementation of this is that the .git directory that holds the repository has a file named HEAD in it. This .git/HEAD file either contains a branch name, in which case you're on that branch, or it contains a commit hash ID, in which case you have a detached HEAD at that commit. Since Git stores this in a file, it's possible on Windows and MacOS to use head in all lower-case, but it's better to stick with the all-capitals version. If you want a shortcut that's easier to type, @ by itself also means HEAD.)

When you want to use a regular repository, in a system in which you start by cloning the repository (rather than creating it from scratch), you do this:

git clone  []

e.g., git clone https://github.com/git/git.git to clone the Git repository for Git via GitHub. This creates a git directory wherever you are right now. If for some reason you wanted the clone to be put in /tmp/git you would use git clone https://github.com/git/git.git /tmp/git. So there are two key items that Git needs, in order to make a clone:

a URL, and
a path where the clone should go.

The URL is typically an https:// or ssh:// style URL, listing some upstream host / server (or cloud-system such as GitHub) and a path on that host / server. (Note that git@github.com:path/to/repo.git is just shorthand for ssh://git@github.com/path/to/repo.git. The two mean exactly the same thing.)

The process of adding a submodule to an existing repository is much the same:

git submodule add  []

The url here will also typically begin with https:// or ssh://. The is the path within your repository, i.e., the place to put the submodule.

The reason for this URL-and-path is that git submodule add will in fact run git clone for you. The clone it makes will be an ordinary Git repository, because a submodule is an ordinary Git repository. Git just needs to know where do I get the clone from and where should I put it within this repository.

The other thing that git submodule add will do—the extra part that makes your current Git repository act as a superproject to that submodule—is to create or update a file named .gitmodules, and to add an entry to your superproject's index.

Note that the subproject does not have to know about its superproject, and in the bad old days, really didn't know anything about it. (In modern Git the subproject's .git directory gets migrated into the superproject's .git directory. The .git that would be found at the submodule is replaced with a file that points the submodule to its superproject's holding area.)

Anyway, the side effect of all of this is that the set of commits in a submodule is determined by the contents of the submodule alone. The superproject has no effect on it! The submodule is just a clone of some existing URL.

This is not the way you're trying to use submodules, but before we get to that, let's look at the rest of the normal operation of all of this. We have some superproject—a local Git repository that is perhaps a clone of some origin repository—where we make our superproject commits. Within this superproject, we have now created a file named .gitmodules that gives the URL and path of another Git repository. Let's say the path is dir/sub. If we run:

cd dir/sub

we find that we are now in the work-tree of a separate clone, that has its own origin/master and master and so on; but this clone has a detached HEAD. Running git log shows the the detached-HEAD commit, then its parent(s) and their parent(s) and so on, as if history ends at whatever commit we have out as the detached HEAD. This is our submodule Git repository.

If we cd back up into the original repository:

cd -    # or cd ../..

we're back into the main repository. Using the ordinary file system tools shows us that dir/sub exists now and is a directory. There is a file (or if your Git is older, a diretory) named dir/sub/.git. If it's a file, it contains one line reading:

gitdir: ../../.git/modules/sub

Running git status shows two added files:

Changes to be committed:
  (use "git reset HEAD ..." to unstage)

        new file:   .gitmodules
        new file:   dir/sub

But inspecting the index—which is a little tricky; I'll use git ls-files here—shows that dir/sub is not a directory at all:

$ git ls-files --stage dir/sub
160000 50298bbf97b317f17b3e1cf9287e912fb5de886e 0       dir/sub

Entries with mode 160000 are what Git calls a gitlink.

If you know that dir/sub is a gitlink, you can view its hash ID more directly using git rev-parse. The syntax :0:dir/sub means "dir/sub from the index (at slot zero)":

$ git rev-parse :0:dir/sub
50298bbf97b317f17b3e1cf9287e912fb5de886e

These tell us the same thing, except that if dir/sub weren't a submodule, we would be able to see that in the git ls-files --stage output.

This is how Git generally envisions submodule usage

The general idea here is that, in your superproject, you use some sort of third-party library (say, Google gRPC) that you personally don't control in any way. Instead, you write your software and make it work with one particular version of that library:

$ (cd dir/sub; git checkout v3.2.1)

By checking out some particular tag in the submodule, you move the detached HEAD to that particular commit. Then you make any changes needed to your own project—your superproject—to make it work with v3.2.1 or whatever version that is:

$ ... make some changes ...
$ git add ... files ...

Having now updated your files, you now also update the gitlink entry that says that your superproject Git should git checkout the one particular commit that you have right now in your submodule:

$ git add dir/sub     # update the gitlink to whatever hash v3.2.1 represents

Now when you make a new commit, the superproject commit continues to list the other repository—with its URL, whatever that is, and its path, dir/sub—in your .gitmodules, and this same commit declares: This commit works with the submodule detached to .

So, whenever someone runs git clone on your superproject, and then does a git checkout of that particular superproject commit, a subsequent:

$ git submodule update

will make sure that dir/sub has that particular gitlink-ed commit checked out, as the detached HEAD. Now your superproject and submodule are in sync, and you can build.

This is not the way you're trying to use submodules

In your case, you already have the submodule Git repositories. They may or may not have a suitable upstream repository. They exist at sub1 and sub2. I'll use, as my example, dir/sub again, though:

$ git submodule add ./dir/sub dir/sub
Adding existing repo at 'dir/sub' to the index

The URL here, ./dir/sub, is pretty useless to anyone else. (It has to start with ./ or ../ to be relative to the current working directory—Git refuses to add the submodule without the leading ./.)

At this point, the same thing happens as with a normal URL: Git has created or updated your .gitmodules to list the URL and path:

$ cat .gitmodules
[submodule "dir/sub"]
        path = dir/sub
        url = ./dir/sub

and put the hash ID that corresponds to the submodule's HEAD into the index to serve as the next committed gitlink entry:

$ (cd dir/sub; git rev-parse HEAD)
1fdcf14961c81d03496b359389058410f0169782
$ git rev-parse :0:dir/sub
1fdcf14961c81d03496b359389058410f0169782
$ git status --short
A  .gitmodules
A  dir/sub

Thus, if you now make a new commit at this point, the new commit will have the .gitmodules and index entries needed to make this Git repository attempt to manage—or clone, if it's missing—the other Git repository into dir/sub, based on the URL ./dir/sub.

This URL is of course entirely useless unless there's already a Git repository at dir/sub, but that's how we tell this Git that it is being the superproject to another Git repository at dir/sub. You can use Git this way, and as long as you already have another Git repository at dir/sub, your superproject Git will be OK with that and will command it. The command your superproject Git will issue to the submodule Git is: Check out this one specific commit, as a detached HEAD.

How the superproject sees submodule changes

Assume you go into the submodule and use git checkout to check out, or even create, some other commit, perhaps by doing git checkout of some branch name and then maybe working in the repository as usual and committing. Then you cd back to the superproject and run git status. Your Git will tell you that the submodule is modified (note the blank before the M here):

$ git status --short
 M dir/sub

This modification exists, but is not yet in your index, i.e., is not yet set up to be committed:

$ (cd dir/sub; git rev-parse HEAD)
860be47095f79afbf94c62d0c3936a9875905e16
$ git rev-parse :0:dir/sub
1fdcf14961c81d03496b359389058410f0169782

As you can see, the submodule is detached at 860be47095f79afbf94c62d0c3936a9875905e16, even though the index says that the next commit will contain a directive to use 1fdcf14961c81d03496b359389058410f0169782. **This is exactly like any modified file in the same repository,* except that you use git add here to tell Git: put the new hash ID in rather than telling it copy the work-tree contents in.

Hence, once we do git add, the --short status output will move the M one letter to the left:

$ git add dir/sub
$ git status --short
M  dir/sub

because now the superproject's index entry for the submodule differs from the HEAD value for that submodule, but does match the actual submodule as found in the work-tree. So now, if everything is ready and we want to tell our superproject Git to command the submodule Git to use 860be47095f79afbf94c62d0c3936a9875905e16 in the next commit we make, we're ready to make that commit now:

$ git commit
[edit a message, etc]

Again, the keys here are:

Each submodule is its own Git repository.
The superproject finds each submodule by path and/or and by .gitmodules, as needed. A new clone of just the superproject obviously does not have any of the submodules cloned yet, so that's what the .gitmodules entries are good for: they provide the URL and the path!
The superproject Git can look at each submodule and find its HEAD: that gets the superproject Git the actual hash ID, and lets you git add that hash ID to the superproject's index, ready for the next commit you make in the superproject.
Or, the superproject Git can command the submodule Git to git checkout, as a detached HEAD, the one specific commit hash ID that is in the superproject's index right now.

If you want to make your superproject command multiple submodules, you git submodule add all those submodules. To make sure those submodules get the right commit hash ID checked-out as detached HEADs, you enter the submodules, put them on the right commits, and then git add the submodules in the superproject.

In modern Git, the git submodule command has some fairly-fancy tricks to coordinate updating submodules using branch names found in the remote (origin, usually) for the submodule. The idea here is that if you are using, say, Google gRPC, and you want to upgrade, git submodule can replace several of the above steps—cd-ing into the submodule, running git fetch, running git checkout, and cd-ing back—with one step. But the actual design of submodules is still "detached HEAD as commanded by superproject": it's up to you to make sure that the superproject Git repository records the correct submodule hash IDs.

Can I return to previous commit in a single submodule?

Answers (1)

This is how Git generally envisions submodule usage

This is not the way you're trying to use submodules

How the superproject sees submodule changes

Related Questions