Reputation: 236

Git patch from several (not all) unstaged files

I read about git patch command, but all of the examples show how to get a patch for all not staged or cached files or even all files that some commit includes (from one commit till another). But suppose I have 10 not staged files and I need to create one proper patch just for 6 of them. How can I do this? And if there is a way to create such patch, how can I apply it?
Sorry if obvious.

Upvotes: 1

Answers (1)

torek

Reputation: 489828

You'd have to define the phrase proper patch first. What makes a patch proper? What makes one improper? For that matter, what's the difference between a patch and a diff? There is no one fixed answer, but see Difference between patch and diff files.

That said, git diff produces diffs, and is quite flexible. It's meant for human use, not for use by machines, so its output may or may not be what you want (especially since you did not define any of your terms). The git format-patch program is less flexible and meant more for use by machines: it produces output that can be digested easily by git apply, which is meant to apply a single patch without committing it, or git am, which is meant to apply a whole series of patches stored on your computer in "mailbox format", committing each one. (am here is short for Apply Mailbox, more or less.)

Because you haven't defined your terms, there's no single right answer to your question. If we assume you mean produce a file that git apply can apply, we get one possible answer. If we assume you mean produce a mailbox-format patch that git am can apply and commit, we get a different answer.

The git format-patch command will produce a mailbox formatted patch (or series of patches) from some commit or commits. So to use it, you must make a commit. You can simply commit those particular files you wish to have in your patch, on a new branch if you like. (See the long details below.)

The git diff program, or any of its more machine-oriented related commands (git diff-tree, git diff-files, git diff-index) will produce a human-readable diff. If it is not colorized, it will be suitable for use with git apply. To use these, you need not make a commit.

As LeGEC noted, you can use git diff on specific files. Note that by default, git diff -- paths compare the index copy of each specified path to the work-tree copy of the same path. This is probably what you want. If you have configured git diff to always produce colorized output, turn that off for the duration of the one git diff operation. Save the output somewhere:

git diff -- file1 ... fileN > /tmp/patch

or:

git diff --color=never -- file1 ... fileN > /tmp/patch

if you need to disable colorization.

Long: about commits

If the above suffices, there's no need to read the rest of this.

If your definition of proper patch means one that git am can turn into a commit, you will need to make a new commit. This is where knowing what staged vs not staged, and how branch names and commits all work, becomes very important.

Git is really, at its heart, all about commits. Branches—or more specifically, branch names—are useful, especially if you are a human, but they're not really what Git is about. Git is all about the commits.

Every commit is numbered, but the numbers are not simple sequential counts. We can't find commit #1, then commit #2, and so on. Instead, each commit gets a unique hash ID. This thing looks random, but in fact is entirely non-random, and is carefully computed so that every Git will number a 100%-identical, bit-for-bit-the-same commit the same way. That way, every Git everywhere will agree that this commit gets this hash ID, and no other commit gets this hash ID.

What goes into a commit comes in two parts: the data, and the metadata:

The data in a commit is pretty simple: it's a full snapshot of every file that Git knows about.

This obviously makes Git repositories grow enormously fat, since every commit stores every file. But they don't (grow enormously fat). The reason is that the files are stored in a special, read-only, Git-only, compressed and de-duplicated format. If you make a thousand commits, each with 1000 files, but re-use 999 of the 1000 files each time, then all thousand commits share 999 files with other commits.

(There is more to it than this, but the de-duplication is the first step, and a very big one.)
The metadata in a commit is information about the commit. This is where Git stores the name and email address of the person who made the commit, for instance. In this metadata, Git stores one particular set of information that Git needs for itself, though. Every commit stores the commit number—the hash ID—of its parent commit. This is how history exists in a repository.

Since the files in the commits are read-only—and Git-only, frozen-for-all-time, compressed into this special freeze-dried format that only Git itself can use—the files you work with have to be extracted from commits. This is what git checkout (or, since Git 2.23, git switch) does. You pick a commit and tell Git: Extract all the files from this commit, so that I can see them and work with / on them.

Since each commit remembers its previous commit, all you need to do, to use Git, is have Git remember for you the unique hash ID of the last commit. This is where branch names come in. A branch name simply holds the hash ID of the last commit that is to be treated as part of that branch.

This means we can draw a branch like this:

... <-F <-G <-H   <--master

The name master holds the actual hash ID of the last commit, which at this point is hash ID H. That commit holds a snapshot of all of your files, and also holds the hash ID of earlier commit G.

Git can look up any commit—any internal object, really—by its hash ID, so Git can look up commit H, extract all of its files, and let you work on it. Or, Git can look up commit H, find its parent hash ID G, and look up commit G. Git can then extract G's files, or look up its parent hash ID F. This is the first of the big secrets of Git.

Commits, your work-tree, and the index

Given the above—read-only commits, and read/write files—we have already seen that Git must extract a commit into an area where you can see and work on your files. This area is your working tree or work-tree. So there are two copies of each file of interest: the one in the current commit, which is frozen for all time, and the one in your work-tree, which you can use. It's pretty straightforward, with one twist: you can create files in this area that Git doesn't know about.

The really tricky part here, though, is how Git makes a new commit, and what files Git does know about. You might think that Git would just keep a list of file names, for instance, and use your work-tree files to make new commits ... but it doesn't.

Instead, Git keeps a third copy—well, a de-duplicated copy, in the freeze-dried format—in an in-between place. Between the current commit and your work-tree, Git has this other "copy" (already de-duplicated, so not exactly a copy) of each file that Git took out of the commit at checkout time.

These in-between "copies" of each file are in what Git calls, variously, the index, or the staging area, or (rarely these days) the cache. Note that these copies are ready to go into a new commit, as they're already in the Git-only, freeze-dried format. Unlike the copies in the commits themselves, they can be replaced, though.

This is what git add is all about. The git add command means Make the index copy of some file(s) match the work-tree copy. If you have changed a file in your work-tree, you must tell Git to copy the updated file back into Git's index.

This is what a staged file is. At all times, Git's index has copies of each file that Git knows about. If it's in the index, it's ready to be committed—but maybe it's the same as the file in the current commit! If so, it's already de-duplicated, and Git can easily tell that it's the same.

If the index copy of a file is different from the current commit copy, or is entirely new, then what will go into the next commit is different from what is in the current commit. Git calls that staged. If the index copy is the same as the committed copy, though, Git says nothing at all.

Meanwhile, the index copy of a file might match the work-tree copy, or not. If the index copy does match the work-tree file, Git doesn't say anything about it. If not, Git says that the file is unstaged.

This means a file can be both staged for commit and not staged for commit, at the same time! If the committed copy (which can't be changed) doesn't match the index copy, and the index copy doesn't match the work-tree copy, you have a file that is both staged for commit and not staged for commit. You can get this state by doing:

git checkout somebranch
edit file.ext             # change something in a file
git add file.ext          # copy the updated file back into Git's index
edit file.ext             # change something else in the same file

When you run git commit, what Git does is to make a new commit from whatever is in Git's index at that time. So if you have ten unstaged files right now, and you git add six of them and then run git commit, you get a new commit in which:

six files don't match the previous commit (the one that's current before you run git commit), but
all the other files do match the previous commit.

Now that you have made the new commit, the new commit is the current commit. You made it from the files that are in the index, so all files in the new, now-current commit match all the files that are in the index. No files are "staged for commit", but the four files you didn't git add are still there in your work-tree, still different from the corresponding four files in the index and those four files in the current commit. So these four files are still "not staged for commit".

If you like, you can now git add these four files to copy the work-tree versions back into Git's index, and then git commit the result. You now have two new commits that you did not have before. The last one matches your work-tree, so that the current commit, Git's index, and your work-tree all match: no files are staged and no files are unstaged.

More about branch names

Note that each time you do make a new commit, Git has to update the current branch name. Suppose you're on your master branch initially, like this:

...--F--G--H   <-- master

Now you create a new branch name, such as feature. This new name also identifies commit H. We'll add the name HEAD, in all capital letters, to one of the branch names to show which one we're using:

...--F--G--H   <-- feature (HEAD), master

Now we'll git add some file(s) and make a new commit. It will get a new random-looking hash ID; we'll just call this I:

...--F--G--H   <-- master
            \
             I

The trick is that Git now writes I's hash ID into the name feature—the one HEAD is attached-to—so that the name points to I now:

...--F--G--H   <-- master
            \
             I   <-- feature (HEAD)

If we add some more files and git commit again, we get another new commit J:

...--F--G--H   <-- master
            \
             I--J   <-- feature (HEAD)

Note that each commit has a full snapshot of every file, as it appeared in Git's index at the time you ran git commit. When you use git format-patch to turn a commit into a patch, Git:

extracts the files from commit's parent (e.g., H for I);
extracts the files from the commit (I);
compares the extracted files; and
tells you which files are different, giving you a recipe for changing the older versions into the newer ones.

Since this is a commit-able patch, Git adds a header to it giving the name and email address of the person who made the commit, an appropriate date-and-time stamp, and the log message in which whoever made the commit explains why they made that commit. The patch itself comes after this header.