How do I remove a folder, but keep all subfodlers and files, only in one git branch?

Question

I am setting up a git branch that needs to have a different directory structure to the rest of them. The files are all the same, however, they must essentially all be moved a level up. There are a lot of files, and I do not know how to move them alltogether using git.

Currently, it looks somehting like this:

Root Directory
    |
    - Main Folder
    |    |
    |     - Sub Folder 1 with a lot of subfolders and files
    |    |
    |     - Sub Folder 2 with a lot of other subfodlers and files
    |
    - A couple of random files

What I want to end up with:

Root Directory
    |
    - Sub Folder 1 with a lot of subfolders and files
    |
    - Sub Folder 2 with a lot of other subfodlers and files
    |
    - A couple of random files

However, this change should be exclusively in one branch.

How would I do that?

torek · Accepted Answer

There are a few useful background things to know here before you start:

Git only stores files, not folders.
Every Git commit stores a complete snapshot of all of your files—or rather, all of your committed files. This is where what Git calls untracked files come in.
The snapshots are made from files that are stored what Git calls, variously, your index or your staging area. (It has one more old name, which is now supposed to be used for something else, but sometimes some things will refer to the cache. All three names are about the same thing.)

Git stores these files in commits. Git is really all about commits. Each commit is numbered—but not in a nice easy sequential "commit #1, commit #2, ..." fashion. Instead, each commit gets a unique hash ID, with the hash IDs appearing completely random and unrelated to previous commits. These hash IDs are the big ugly strings of letters-and-numbers like 83232e38648b51abbcbdb56c94632b6906cc85a6 that git log spits out.

Since every file is in every commit, it's important for Git to save them in a way that doesn't use up your entire disk drive right away. So the saved files are compressed and, further, shared across different commits. Git can do this because it uses a special, Git-only, freeze-dried format to store the files. Files in this form cannot be changed, but can be shared. This means that no existing commit can be changed. Every commit in your repository is archived, more or less permanently.¹ Think of commits as permanent (they mostly are) and unchangeable. They are the history, stored in the repository.

¹It's possible to remove commits, but it's a bit hard and Git generally doesn't do it right away either—so even if you think a commit is gone, and can't find it right away, it's probably still in there.

Getting work done

Now, this is all well and good for archiving, but these read-only freeze-dried files are completely useless for actually getting any actual work done. For that, Git provides what Git calls a work-tree. That's simply the place where you do your work.

Into the work-tree, Git extracts the freeze-dried files from some commit, rehydrating them so that they have their normal everyday form. You can now see and work with these files. You simply pick one commit—usually, the last commit on some branch—and say: Get me that commit, and Git does that. It finds the frozen commit and enumerates all the files in it:

main-folder/sub1/file1: Aha, says Git, this work-tree has no main-folder, let's make one. And it has no sub1 in the main-folder I just made, let's make that too. Now I can create new file main-folder/sub1/file1.
main-folder/sub1/file2: Hey, says Git, there's already a main-folder/sub1, I can just create new file file2 in there.

This process repeats as needed: Git has files, as listed in the commit, that it has to reconstitute. When it's done with that, if the work-tree was empty when you started, well, now it has the rehydrated version of every file from that commit. No folders were stored, but there was no need to store them.

If you now switch from that commit to a different commit, Git will remove all the files it created just for that commit, and replace them with files for the other, different commit. If it removes all the files from main-folder/sub1, it also removes the directory main-folder/sub1. If it winds up removing everything in main-folder, it removes that too. Then it goes about extracting all the files from the commit you want now, creating any directories/folders as required.

In fact, Git interleaves all of this work, creating and deleting, and optimizing: if you switch from commit a123456... to commit b789abc..., and 99% of the files in the two commits are the same, well, there's no need to go mucking with them in the work-tree after all, is there? And, with this particular form of git checkout, Git adds a safety check before switching commits: For each file I have to remove or replace, is the file in the work-tree "clean"? If the file is "clean", it's safe to remove or replace it. If it's "dirty"—if you've changed it since Git extracted it, and you might want to keep your changes that switching would clobber—Git will warn you about this and, by default, refuse to switch commits.

The index / staging-area

There is one huge honking wrinkle in this process. Reading through the above, you'd think: Okay, we have commits with freeze-dried files, and the work-tree with normal files. But there's a third entity that Git puts between these two. This is the index / staging-area.

Like the commits themselves, the index is mostly invisible. It's actually just a plain file, .git/index in most cases—this gets more complicated eventually, but it starts out as just this plain file. What's in the file is, in essence, a copy of the commit you extracted–all the freeze-dried files, using only a hash ID (like commit hash IDs) to identify them. Unlike the actual frozen files in commits, though, the copies that are in the index can be changed.²

This is what git add does: it freeze-dries the file and sticks that version in the index. If the file wasn't in the index before, well, now it is. If it was in the index before, this kicks out the previous version. In either case, the new freeze-dried file is ready to be committed. When you run git commit, Git just packages up all the ready-to-go files from the index, into the new commit. That's why git commit is so fast: it really has very little work left to do.

The files that are in the index aren't stored in folders at all. There's just one giant list: file path/to/file1 has these freeze-dried contents, file path/to/file2 has these other freeze-dried contents, and so on. But one way or another, the presence of the file in the index—along with the freeze-dried ready-to-commit content—are what make the rehydrated file in the work-tree tracked. A tracked file is one that is in the index, so an untracked file is simply any file that is in the work-tree, but not in the index. Since git commit archives what's in the index, not what's in the work-tree, only tracked files get committed.

²The tricky part here is that putting a new file into the index actually freeze-dries the file and stores it in the repository, creating a new hash ID if the new contents are truly new, or sharing some existing hash ID if the freeze-dried contents match any existing file. Now that the supposedly-new file has been reduced to just a hash ID, it fits in the same slot in the index that the old file occupied!

With that out of the way, the answer is now easy

To make a new commit that, in that commit, stores only certain files, just set things up so that your index has just those files in it. To do that, remove all the totally-unwanted files from your index, which will also remove the work-tree copies:

git rm ...

Since the index stores them by their path relative to the top of the work-tree, you'll want to save all the files you want to keep somewhere. The easiest way to do that is to rename them in both the work-tree and the index:

git mv main-folder/sub1 sub1

which will create (by renaming, in this case) the sub1 folder in your work-tree if needed, then rename all the tracked files in the index—remember that git mv has to work with the index as well as the work-tree—from their main-folder/sub1/file1 etc paths to have sub1/file1 etc paths. The git mv command, like the git rm command, then drags the work-tree files along with it.

(Conveniently, or maybe not sometimes, when git mv renames the folder in-place, that also renames any untracked files within it. Since the rest of Git isn't really interested in untracked files, a later git checkout won't move them back!)

Since, underneath everything, Git stores files by content—using the hash IDs of the freeze-dried files—all of this renaming is mostly free. Git needs a little bit of space to store the updated names—commits have to store the full names along with the hash IDs, and the names can't easily be shared here as they're different³—but the actual contents are shared with the files that have different names in other commits.

Note that when you switch back and forth between this commit, with these sub1/file1 type names, to any commit that has the main-folder/sub1/file1 type names, Git may well have to churn hard on your work-tree, removing all the sub1/file1 names first, then creating new, empty main-folder and main-folder/sub1 directories to hold the (in the end same!) files that used to be in sub1/file1 and so on. When and if Git can be clever enough to realize that it could just rename those files in the work-tree, Git might do that, but the easy dumb way, which is what Git usually starts with, is just to remove and re-create them. This will show up in the OS-level file time-stamps: if Git removes a file and re-creates it, it gets "now" as its on-disk work-tree-file time-stamp.

³Inside commits—but not in the index—Git goes right back to a tree-structured naming scheme. So if sub1 in the top level of this new commit is 100% identical to the sub1 that was in main-folder/sub1 of some other commit, Git will actually share the underlying tree object for the sub1 sub-tree of the root tree of the new commit. The root tree will of course be different as it will name sub1 as one of its sub-trees, and not name main-folder as one of its sub-trees. But all of this is mere implementation detail: none of it shows up in the index and work-tree.

How do I remove a folder, but keep all subfodlers and files, only in one git branch?

Answers (2)

Getting work done

The index / staging-area

With that out of the way, the answer is now easy

Related Questions