Jim Newton
Jim Newton

Reputation: 652

Is it possible to stage multiple sets of changes separately in git?

This is a question (request for advise) about a frequenly used git work flow that I've developed over years of using git.

If I have made lots of changes to a project (multiple changes in multiple files), I often iterate through all the hunks (with git gui) and decide which I want to stage together and then commit the several hunks together with one commit and a descriptive comment. Then I continue through all the remaining hunks again grouping hunks together into a common commit.

But a problem that often occurs, is on the 3rd or 4th iteration, I discover a hunk which should have been in one of the previous commits. In this case I temporarily commit the staged changes to something called TMP, commit the hunk in question "amend ", then continue staging the remaining hunks, and comment with message "amend TMP". After I'm finished committing all the changes, I go back and rebase -i and reorder and squash the "amend..." commits together appropriately.

QUESTION: Is there a better way?

I've wanted to be able to go through the hunks and stage them one by one into separate sets. And when everything is staged, loop through the staging sets, and commit them one by one with an appropriate comment. I would also like to accumulate/edit the comment as I build the stage sets.

Does such a feature exist? What is it called?

One possibility might be to allow a single hunk to be appended to an existing commit, then rebase the later commits, but somehow otherwise leave the work area and staging area effectively unchanged.

This question is similar to How to create multiple stages in Git and Break up multiple changes into separate commits with git? but those discussions don't really answer my question sufficiently. Yes, it's possible, but it seems like there must/should be a better way.

Upvotes: 13

Views: 2261

Answers (2)

Rufus
Rufus

Reputation: 5566

Closest I could find is https://github.com/ustramooner/gitstage

Introduced in this blog post: http://ben.villagechief.com/2012/01/multiple-git-staging-areas.html

The workflow scenario I use is generally something like this

  1. Run git gui
  2. Realise that I need to split the commit into 2... duhhhh
  3. Close git gui without commiting
  4. Run gitstage commit1, which creates a staging area called 'commit1' and then opens the git gui again
  5. Make note of what I've staged so far in the git gui Commit Message textbox
  6. Close the git gui.
  • gitstage will now creates a commit of the currently staged index
  • gitstage also puts any messages you were writing in the git gui in .git/gitstage/commit1.msg 7 Run gitstage commit2, which creates another staging area and opens git gui.
  1. Add other files/hunks/lines to the new staging area in the gui
  2. Repeat steps 4-8 multiple times. Each time you swap the messages you write will be kept
  3. Committing: click commit in the gui and close the gui
  • this then removes the staging area
  • commit any other remaining staging areas
  • end of work flow

Upvotes: 1

torek
torek

Reputation: 489123

The short answer is "no, there's no better way"—but you can experiment with git worktree add, perhaps. (This will run you smack into a different problem, but it may not be a problem after all.)

The problem is that there is only one index,1 which Git calls the index, or sometimes the staging area or the cache. Meanwhile, no commit, once made, can ever be changed at all. Not even git commit --amend changes a commit (we'll get to this later).


1This isn't quite true. In particular, if you use git worktree add, you get one index per work-tree. Git also allows various commands to use a temporary index file; things like git stash, and even extra-complicated varieties of git commit, use this because git write-tree always use the index as their input, but can be pointed to a temporary index. Actually using a temporary index is going to be far too tricky, though.


When you use git add -p or some fancier GUI to interactively select particular changes (diff hunks or individual lines or whatever) to add to the index, you're creating a file in the index that appears nowhere else.

Imagine a really simple repository with just one file, README. You clone the repository and are on master. The situation looks like this:

HEAD      index    work-tree
------    ------    ------
README    README    README

All three copies of README are identical (though secretly, Git's HEAD and index versions are compressed, and actually share the underlying disk storage so that there are only two on-disk images for README, the compressed Gitty one and the uncompressed plain one).

Now you fire up your favorite editor and modify README. Let's call this README;1 just to have a horrifying syntax2 for identifying "a different version of the file". :-) Now you have this:

HEAD      index    work-tree
------    ------    ------
README    README    README;1

But, you made a big change, so you decide you want to interactively add some hunks of the change, using git add -p or whatever. Once you do this, you wind up with this:

HEAD      index    work-tree
------    ------    ------
README    README;2  README;1

That is, there are now, literally, three different versions of that file that you are working with simultaneously: the committed one that you can't change; the work-tree one that you did change; and the index one, that you created as a Frankensteinian hybrid of the HEAD and work-tree versions.

Since there is only one index, there's only one place available in which to create this intermediate version of the file. So you have to create it, then commit it. Committing makes a new commit, which gets a new unique hash ID, and then makes that new commit become the HEAD commit, so that you now have:

HEAD      index    work-tree
------    ------    ------
README;2  README;2  README;1

That frees up the index to make yet another new variant of the file: the second version of README is safely saved, completely unchangeable (and mostly-permanent), in the HEAD commit.


2This is, in fact, the syntax for versioned files in VMS.


In a comment, mkrieger1 linked to a question where the answers suggest using the fixup and autosquash features of git rebase, including using git commit --fixup to record a commit for autosquashing. These, like git commit --amend, make use of a very useful property of Git's branch names.

A branch name, in Git, points to exactly one commit. The set of commits contained within a branch is determined by starting from that one commit, which Git calls a tip commit, and working backwards: that commit has a parent, the parent commit has another parent, and so on. Each commit is stored under its big ugly commit hash ID: the branch name contains the hash ID of the tip commit, and each commit contains the hash ID of its parent:

... <--E <--F <--G   <-- master

We say that master "points to" the tip commit G, which points back to F, and so on.

Since no commit can ever be changed, the internal arrows always point backwards, and don't really need to get drawn, which is handy since it's hard to do well in ASCII on stackoverflow. :-) So I draw this instead as:

...--E--F--G   <-- master

(I keep the arrow in front of the branch name, because branch names do move.)

Now, this means is that if we make a new commit whose parent is not the normal "current commit hash ID", but instead is the parent of the current commit, and then make the current branch name point to the new commit we just made, we seem to have replaced the current commit:

         H   <-- master
        /
...-E--F--G

We haven't actually changed any commits, but it looks like we did when we run git log, because with nothing pointing to G, Git doesn't show it. Git starts by showing us new commit H, then moves back to commit F, then moves back to E, and so on.

This is what you get with git commit --amend: the new commit simply has, as its parent, whatever parent the current (well, now ex-current) commit has.

The git rebase command takes this to a new level: instead of just one commit, we can copy many commits, making some slight change(s) as we go. With interactive rebase, we have Git use git cherry-pick on each to-be-copied commit. The copies can go after any commit you like, though for things like autosquash, you generally do the copies in-place:

...--E--F--G--H--I--J   <-- branch

where J is a fixup for H. Now you run git rebase -i --autosquash <hash of G> and Git generates the commands:

pick <hash-of-H>
pick <hash-of-I>
pick <hash-of-J>

which, if run totally straightforwardly, would result in:

             H'-I'-J'   <-- branch
            /
...--E--F--G--H--I--J   [abandoned]

But rather than running them, the autosquash feature notices that J itself has, as its one line commit subject, a prefix: fixup! <subject>. The <subject> part of this matches H's commit subject, so the autosquash code changes the instructions to:

pick <hash-of-H>
fixup <hash-of-J>
pick <hash-of-I>

Executing these instructions gives:

             HJ--I'   <-- branch
            /
...--E--F--G--H--I--J   [abandoned]

where HJ is the automatically squashed H+J, using H's commit message.


Now, once again, the problem here is that there's just the one index, and you're building your intermediate images in that one index.

If you use git worktree add, you can make as many work-trees as you like. Each has its own index, and of course its own separate work-tree as well. But there's one very strong constraint Git imposes: each work-tree must be on a different branch.

This may not be a problem after all. Remember that in Git, branches are insanely cheap: making a new branch costs just one disk block, holding one 41-byte file. (A future implementation might change these details but branches will remain absurdly cheap.)

Let's go back to this drawing:

...--E--F--G--H--I--J   <-- branch (HEAD)

We can create a new branch now, and all that Git does is write a file that also points to commit J. This is why we add (HEAD) to the drawing, so that we know which branch our work-tree is using:

...--E--F--G--H--I--J   <-- branch (HEAD), br2

We can now add, or copy, or rebase, or whatever, however we like. The new commits, which are entirely read-only and mostly permanent, are safe and always separate from the old commits. The new name, br2, safely keeps the original commits, no matter what we do with them. Or, we can switch HEAD to br2 and let the old name, branch, keep the original commits safe:

...--E--F--G--H--I--J   <-- branch, br2 (HEAD)

Now let's do something with H-I-J just like before:

...--E--F--G--H--I--J   <-- branch
            \
             HJ--I'   <-- br2 (HEAD)

If you make a new worktree, you can make it have a new branch. The new worktree shares all the old commits, and all the old branches.

Git prohibits you from having two work-trees using the same branch because then they would both point to the same commit and yet have two index files and two work-trees. When you made a new commit in one of these, that would use the index in that one, and change the shared branch to point to the new commit. The result is that the other of these two would still have its old (now stale) index and old (now stale) work-tree. The Git authors deemed this to be too confusing, and simply outlawed it. Because branches are so cheap, this is actually pretty reasonable: just make a new branch for each new work-tree.

The advantage you have here is that you not only have multiple index files, you also have multiple work-trees. You can stop trying to play index file tricks with extra versions of each file (git add -p): just make the work-tree file look the way you want, and then test it, and then commit it. All of these are in temporary work-trees on what could be temporary branches, if they don't work out all that well. If they do work out well, use the best one as the final result. Simply remove (rm -rf) all the lesser work-trees and (git branch -D) "didn't work out after all" temporary branches once you're satisfied.

Upvotes: 6

Related Questions