Reputation: 60758

Split a git branch or commit by file type

I have a branch containing html and javascript code. For cross cutting reasons, I need to submit a mass change to html first then at a later date the js. At moment I have a single branch with both kinds of changes into it.

How can I sort the commit into two smaller commits (or branches) by file type?

Upvotes: 0

Answers (2)

djechlin

Reputation: 60758

I hacked it together from this answer:

git checkout -b new-branch # checkout from master
git diff big-branch --name-only -- '*.html' | xargs git checkout html-comments-take-two --

May need to pipe to sed 's, ,\\&,g' if you have spaces in file names.

I couldn't get it to work using a **/*.html style pathspec and I don't know why.

Upvotes: 0

torek

Reputation: 487993

You can't change the existing commit, but you can make new commits whose parent is the same commit as the existing "committed too much" commit.

Before you even start, make sure you have a clean work tree ("nothing to commit"). That way no git reset or whatever can lose anything. If necessary you can make a new commit so that you might need zorg~2 instead of zorg~1 (see diagrams below). You will be able to retrieve your saved items from this commit later.

Draw what you have right now

As usual with Git, start by drawing (at least some part(s) of) your commit graph. You are on some branch now, which means that you have the branch name pointing to the tip-most commit, and that commit pointing back to some parent commit, and so on:

...--A--B--C--D   <-- zorg

where zorg is your current branch and presumably D is this too-big-commit and C is the commit before it that has neither set of changes. (If you had to make some more commits, perhaps commit D is one or more steps back; if so, adjust the numbers below.)

Hint: use git log --graph --oneline --decorate (with or without --all as well) to get Git to draw the graph for you (though it draws vertically, with more-recent stuff at the top, instead of horizontally with newer stuff towards the right).

Draw what you would like to have instead

You cannot change D but you can make new commits E and F, which you can arrange this way:

...--A--B--C--D     <-- ... we'll fill this in later ...
            \
             E--F   <-- ... likewise this ...

or this way:

             F      <-- ...
            /
...--A--B--C--D     <-- ...
            \
             E      <-- ...

Commit D will continue to be your "too big" commit while E might have just the HTML changes and F might have just the JS changes. (If F is built on E, then it really does have both changes and actually matches commit D in terms of contents. If F is built on C, then it has only the JS changes. It's up to you to decide how to arrange these.)

Each ... is to be filled in with a branch name. You can leave the existing branch name alone and invent one or two new branch names, and that's what I'll show first.

Doing it manually

Let's say you want two new branch names, and E and F each to have C as their parent (so, not C--E--F). Git being Git, there are many ways to do this, but one easy one is to create them with git checkout -b, which creates new branch names and also switches on to them (so that git status says you're on the new branch). This git checkout -b command also takes an optional commit specifier, which is the commit to have in the index and work-tree once the new branch is created. We want both E and F to spring forth from C, so we want to create the new branch "at" commit C:

git checkout -b zorg-html zorg~1

The name zorg identifies commit D. Adding a ~ suffix means "from this commit, step back across first-parent links however many times I say in the number". Since the number is 1 (one), we'll step back one parent, which takes us from D to C. This means the name zorg-html will currently point to commit C, and we'll be on this new branch.

Now that we're on zorg-html (at commit C) we simply want to replace all the HTML files. The right version of those files is in commit D, as pointed-to by the name zorg. The easy-but-hard way to get those files is:

git checkout zorg -- first_file second_file third_file ...

which—this is a bit crazy of git checkout—this time doesn't change branches at all, but instead just extracts the specific named files (the list of file names after the -- part) from the specified commit (zorg, i.e., commit D).

If the files are all named ending in .html and no .html file is actually not an HTML file, the easy version of this easy way is:

git checkout zorg -- '*.html' '**/*.html'

That is, get every file named whatever.html from the top level directory, and also every file named whatever.html in any number of sub-directories, out of the zorg commit (commit D, again).

This kind of git checkout writes the updated files into both the index and the work-tree, so at this point you can simply git commit the result.

Now, to create commit F, we repeat this whole process:

git checkout -b zorg-js zorg~1  # new zorg-js branch starting at C
git checkout zorg -- '*.js' '**/*.js'
git commit

(assuming, as before for the HTML files, that every JS file is named .js and no file named .js is something other than a JS file). And now we have:

             F      <-- zorg-js
            /
...--A--B--C--D     <-- zorg
            \
             E      <-- zorg-html

Obviously you can choose better names for all of these branches.

If you wish to make commit F come after commit E, simply omit the git checkout -b that would create a new branch and switch back to commit C. This will leave you on branch zorg-html at commit E when you extract all the .js files and make commit F, so that F's parent will be E, and you will have:

...--A--B--C--D     <-- zorg
            \
             E--F   <-- zorg-html # zorg-html is clearly a bad name

You can stop here if all you wanted are some simple recipes. If you want to learn lots of ways to deal with this and other issues, read on.

What if you want `E--F` on `zorg` itself?

No problem. Git being Git, there are multiple ways to do this. For instance, you can rename zorg before you start:

git branch -m zorg gary-oldman

Now you have this:

A--B--C--D   <-- gary-oldman

and you can safely create a new zorg.

Of course, any upstream setting sticks with the renamed branch. No big deal, you can use git branch --set-upstream-to to set new upstreams for each branch.

Of course, Git being Git, there's yet another way to do it! You can create a new branch name now, pointing to commit D, just to remember it as long as you need it—you'll need it for the two git checkout commands. Then you can git reset the branch name zorg so that it points to commit C:

git checkout zorg  # make sure zorg is the current branch
git branch temp    # save its tip commit under a new name
git reset --hard zorg~1  # and move zorg back to commit C

Now as you make new commits, they will move the name zorg forward but the name temp will still remember commit D for you:

A--B--C--D   <-- temp
       \
        E    <-- zorg

Now to get access to commit D you will use the name temp, and to re-find commit C you will use temp~1.

Note that if you have extra commits "past" D (such as one made to save work done after the HTML and JS changes):

A--B--C--D--H--I--J   <-- temp, or zorg, or whatever

you can still do all of this. It's just that now, to name commit C, you will need either its SHA-1 hash "true name" (which never changes, but is really hard to type in correctly—mouse cut-and-paste is helpful here), or to count back from the tip. Here temp might name commit J, and temp~1 is commit I, and temp~2 is H; so then temp~3 is D and temp~4 is C. Once you're done splitting commits, you can cherry-pick the remaining commits.

Using `git rebase -i`

Git being Git, there is yet another way to do this, especially useful if there are commits after D, the commit to split. This particular one requires some comfort with Git, but in the end is the shortest and fastest method. We start with git rebase -i to rebase commit D (and any later commits) onto C, where it is (or they are) already; but we change the pick line for D to read edit.

Git now drops us into the rebase session with commit D already made. Now we want to git reset HEAD~1 (or git reset --mixed HEAD~1; --mixed is just the default) back to commit C. This sets the current commit—we're in detached HEAD mode, so this just adjusts HEAD itself—to C and resets the index to match C, but leaves the work-tree set up for D. Now we simply selectively git add the files we want: all the .html ones. Use any method you like (find ... | xargs git add or git add '*.html' '**/*.html' for instance) to add these, and git commit the result. Then git add the remaining files and git commit again, and then git rebase --continue to copy the remaining commits and move the branch label to the tip-most resulting commit.

Upvotes: 3