24n8
24n8

Reputation: 2246

How to remove submodule changes in a local commit?

I have a local git commit that hasn't been pushed to remote yet. I accidentally added changes from a submodule into the local commit. If it weren't part of the commit, I know I can just do a git reset, but because it's already part of the commit, I'm not sure what to do.

I went into the submodule and did a git reset --hard origin and then amended my commit, but this didn't seem to do anything.

Upvotes: 3

Views: 1282

Answers (1)

torek
torek

Reputation: 489848

Short version of what you need to do, in order:

  1. enter the submodule (e.g., cd path-to-submodule;
  2. select the desired commit (e.g., git checkout hash, git switch --detach hash, or even git reset --hard hash);
  3. return to the superproject (e.g., cd - or cd ../../.. depending on the path to the submodule);
  4. git add path-to-submodule;
  5. git commit --amend.

You've already done steps 1 and 2, assuming origin resolves to the appropriate hash ID. (Note that origin, as a name for a hash ID, resolves to origin/HEAD which in turn resolves to refs/remotes/origin/HEAD: see step 6 of the six-step process outlined in the gitrevisions documentation. You can run git rev-parse origin/HEAD within the submodule repository so as to see the raw hash ID obtained here.)

Long-ish explanation with details

A Git commit is made up of two parts, which I normally describe this way:

  • Each commit holds a full snapshot of all of the files Git knows about.

  • Each commit also holds some metadata, or information about the commit.

When dealing with submodules, the only real change here is in the first point. Besides the files, the superproject commits also contain gitlinks, one for each submodule. A gitlink is a lot like a symbolic link, only different:

  • A symbolic link, which is supported on most Unix-like file systems and some Windows systems, is in essence a file that contains the name of another file. The operating system is set up so that when you ask to open and read the file (e.g., path/to/file.ext), the OS notices that this is a symlink rather than a regular file, so it opens and reads path/to/file.ext itself, finds that path/to/file.ext contains, say, the text string ../../other.name, and puts these together to read path/to/../../other.name and thus opens and reads other.name in the current directory (the two ..s driving back up over to and path respectively).

  • A gitlink is interpreted by Git itself: it's a raw hash ID for some commit in some other Git repository.

Each file entry in a commit has a path name such as path/name, and a gitlink is no exception: it has a name, path/name or whatever. Then it has a raw hash ID. Git reads the name and looks in a separate table of submodules (filled in, initially, from the .gitmodules file at the top of the commit's snapshot). The table says that submodule path/name is to be cloned from some URL, so git submodule update --init will run git clone on that URL and clone the repository. It then proceeds to do the regular git submodule update as below.

Later (or right now), any git submodule update will:

  • enter the submodule (cd path/name in this case);
  • use the raw hash ID from the gitlink to run git checkout hash or git switch --detach hash.

(The checkout vs switch distinction vanishes in Git versions predating Git 2.23, where git switch was added, but by using --detach this is a distinction without a difference: both do the same thing. There's also a git fetch step in the submodule. I am eliding it on purpose as it's optional and a little tricky. If your submodule clone is a full clone, and there are no new submodule commits, the git fetch step doesn't do anything, so we can ignore it here. I'm also eliding certain options you can manually pass to git submodule update as they make things more confusing, without really enlightening anyone at this point.)

Using --recursive (as in git clone --recursive, git checkout --recursive, or git switch --recursive) tells Git to employ all the submodule magic via git submodule update --init automatically, so that you don't have to think about it, but doesn't change the fundamental process: Git first checks out the commit in the superproject so as to obtain the gitlink path and raw hash ID, and then clones and/or enters the submodule as needed and checks out the commit whose hash ID is given by the gitlink. In all of these cases you wind up with the submodule checked out as a "detached HEAD". That's the normal way submodules work and it's why, in step 2 in the short version above, I recommend git checkout or git switch --detach rather than git reset --hard (though all will work).

What the long-ish explanation means

The superproject repository does not contain any submodule files. Instead, it contains:

  • a .gitmodules file, which gives the instructions Git needs to run git clone; and
  • for each submodule, a gitlink giving a path and a raw hash ID.

Hence, as you make new commits in the superproject, you're putting two things into each of these commits to go with your superproject files:

  1. You're committing another copy of .gitmodules. As this is probably exactly the same as every previous copy, the new commit's copy is literally shared with all the previous commits' copies, so it takes no space. It's a "virtual" copy rather than a "physical" copy. But it's still a "copy", as far as thinking about it goes.

  2. You're committing a gitlink. This gitlink has a path name—that's the submodule's path name—and a raw hash ID. When you run git add in step 4 in the short version, what you're doing is setting up your next commit—the one you make in step 5—to hold the desired hash ID. To make sure you get the right hash ID, you execute step 2 in the short version.

Running git add path/name, assuming the path to the submodule is path/name, is really an instruction to the superproject Git:

  • enter the submodule;
  • run git rev-parse HEAD to get the raw hash ID;
  • leave the submodule;
  • update the gitlink in the index / staging-area.

The git commit then makes the new commit from the superproject's index / staging-area as usual, but because you ran git add, the new commit has a new gitlink hash ID.

Using git commit --amend as we do in the short version means that we kick the previous commit off the end of the current (superproject) branch and add instead a different new commit. The different new commit has the correct gitlink; the previous tip commit was mostly correct but had the wrong gitlink.

Once you can think about a gitlink as a "file" that you store in every commit, you actually understand submodules.

Upvotes: 2

Related Questions