Reputation: 12187

How to prepare clear diffs, when `git add --patch` edits are awkward?

SITUATION: When making a Pull Request, I want the receiver to be able to understand what changes it makes. I find that squashing them into one commit can be confusing, especially if:

there are edits to code which is also moved - diff renders this as wholesale deletion and addition, not highlighting the edits.
code is added to a series of similar sections, e.g. case statements, cascading ifs, yacc productions - diff often reconstructs the change as overlapping sections (e.g. instead of adding a section, it uses the beginning of the previous section, add a new ending and another beginning, then uses the finishing of that previous section); adds the new codean ending, and in some cases, picks out a few minor similarities then deletes and insert a mass of identical code. (I realize diff uses LCS and is amazingly fast - but sometimes its result is hard to fathom, even when considering diff isn't syntax-aware and can't recognize the code "sections" you see).

BTW: I use git diff --color-words --ignore-space-change, which is great, but also mis-reconstructs, can hide detail - and I'm concerned that the recipient might use plain git diff and see something quite different (they can reconstruct differently).

TASK: OK, so the obvious solution to this is to divide up the Pull Request into separate commits. Sometimes, these can be the actual commits I started with, so all I need do is not rebase/squash in the first place. But I'm finding that even then, the diffs can be unclear (especially for reason (2) above), and I need to separate them further.

The obvious way to do this is to use git add --patch/-p. However, patches are hard to work with for overlapping changes - you can divide and even edit the hunks, but it's somewhat mindbending to think in terms of reversing diffs when the change you want combines addition, deletion and common code.
What I've actually done is to edit the file directly: deleting the part I don't want and committing that; then undoing that deletion (with my editor), and committing that. Working in terms of the actual source is much clearer and more intuitive than working in terms of the diffs - but it feels like I'm fighting against git and doing it wrong (also, it seems accident-prone to rely on editor undo).
It occurs to me to instead first git stash the file, and prepare the first commit by deleting the part I don't want; then git stash apply to "undo" that delete to prepare the second commit. But I'm not sure that you can do that in the middle of a rebase (haven't tried it yet).

QUESTION: It's taking me hours to all do this... I guess I'll improve with practice but... Am I on the right track? Is there a better way? Can you prevent mis-reconstructed diffs in the first place? Am I working too hard for clarity?

(To be fair, this was many edits on subtle and complex code done a while ago - and spending these hours revealed deeper insights.)

Upvotes: 2

Answers (2)

13ren

Reputation: 12187

Why not use the commits to undo the changes (instead of stash), since we already have them? There are two problems: referencing the commit, and getting the files (working tree) and index in the right state.

referencing the commit We could cut-and-paste the commit's hash. Or, create a temporary tag, with git tag tmp (delete with git tag -d tmp). Or, count the commits n from the branch and use branch~n. Or, for the commit that rebase is amending right now, use the hash it stored, with cat .git/rebase-merge/amend (but awkward and an undocumented implementation detail - I got info here).
files and index My current understanding: reset and checkout will not change HEAD when you specify a file (paths). When used like this, reset changes the index only; checkout changes both index and files. To just change only a file, you can clobber it with git show <commit>:file > file (note the odd : syntax for files instead of --).

Putting it together:

git checkout -b newbranch  # I'm on a dev branch already; make a new one
git rebase -i master       # only the commits not part of master
...mark one with `edit` or `e`...

git tag tmp    
git reset HEAD^            # changes index only, as if we had just edited
...edit myfile, deleting what is to be split into another commit...
git add .
git commit -m "first commit"

git tag tmp2
git checkout tmp -- myfile # get file and index before above edit
git reset tmp2             # ...so need to reset *index* to first commit
                           # 1. index is same as "first commit"
                           # 2. file is same as commit we wanted to split
                           # (the diff is what we deleted above)
git add .
git commit -m "second commit"

git rebase --continue
git tag -d tmp tmp2        # clean up

The second commit is slightly simpler if we use 'git show', because we don't need git reset tmp2:

git show tmp:myfile > myfile   # clobber file, but not index
git add .
git commit -m "second commit"

It's hard to tell what's happening in all this! Some ways to check the current state:

git log -1                 # see HEAD
git diff                   # between files and index
git diff --cached HEAD     # between index and HEAD
git show-ref tmp           # see tag

Anyway, this all seems far more complicated than just *undo*ing within my editor, which I did in the first place. But I bet this better understanding of reset and checkout will come in handy...

Upvotes: 1

13ren

Reputation: 12187

Based on these answers, after starting an interactive rebase (get rebase -i ...) and editing one commit:

git reset HEAD^     # reverts index to previous commit (not change files)
                    # so it's as if you are just about to add and commit
git stash           # save
git stash apply     # get it back
...edit the file, deleting the changes you don't want in the first commit
git add .
git commit -m "...first changes..."

git stash apply     # get it back again (ie undo the above delete)
...(I needed to resolve a merge conflict)
git add .
git commit -m "...second changes..."

git rebase --continue

A pity there isn't a git stash copy that saves your changes without reverting. There might be a smoother way to do this.

The surprising thing to me is that you can use the full power of git right there in the middle of an interactive rebase. You can ignore the old commit you're "supposed to" be editing, and instead add two commits; you can stash and apply. I probably need to study how rebase is actually implemented, and stop thinking of it as an abstraction. Actually, the rebase manpage has a heading for splitting commits.

Upvotes: 2

How to prepare clear diffs, when `git add --patch` edits are awkward?

Answers (2)

Related Questions