Reputation: 14610
I had to run
git reset --soft HEAD^
to undo a commit with large files (same issue). Now I can see my files again in VS Code Source Control Explorer(see below)
Problem - I want to remove these files from being added to my repo when committing and then pushing, so I added
/.angular/cache
to my .gitignore file, but that didn't remove the files from the Source Control window.
Question - Do I need to do something else to remove these files from Source Control? ex. unstage each file individually
Source Control in VS Code:
Upvotes: 1
Views: 1795
Reputation: 490058
As chepner suggested in a comment, you probably really wanted a --mixed
reset, not a --soft
reset. However, as j6t added, you can recover from this error by using git rm --cached -rf .angular/cache
(be sure to use the --cached
to avoid removing the working tree copies).
You will still want to create or update your .gitignore
so that you can't accidentally add the .angular/cache
contents again. You should git add
the file after updating it (or creating it with appropriate initial contents).
Had you used --mixed
(or the default which is --mixed
), you might have had to add some other files besides the .gitignore
, but you could update the .gitignore
file first, then use a standard "add everything" (git add .
) to add everything except the current untracked-and-ignored files. This tends to be easier to get right, which is why it's the recommended method. But adding everything, then un-adding (git rm --cached -rf
) the unwanted files, also works. It's just klunky and easy to get wrong.
Git is all about commits. Git is not about files, although Git commits hold files, and Git is not about branches, although Git branch names help you (and Git) find the commits. As such, you need to know what a Git commit is and does for you, and how git commit
makes a new commit. So let's touch lightly on commits first.
A Git repository is, at its heart, just two databases. One database hold commits and other supporting objects, and a separate database holds names—branch names, tag names, and other names that help Git find the commits. It's the commits-and-other-objects database that really matters: you can use a repository in which the names database is completely empty (it's just extremely awkward and unpleasant to do that), but you can't use a repository in which the objects database is empty.
Ignoring the supporting here, we'll just talk about the commit objects, since those are the ones you interact with. All the objects, including the commits, are numbered, but their numbers are big ugly random-looking things, such as f01e51a7cfd75131b7266131b1f7540ce0a8e5c1
. The commit hash IDs are always totally unique. So if you have this commit (the one that starts with f01e51a7cf
, that I linked to the GitHub copy of it), it's literally this commit, and you must have a clone of the Git repository for Git. The need for these numbers to be totally unique to each commit is what makes them so big; it also makes them unusable by humans, but the computer is good at them, so the computer uses them. We use branch names instead, as we'll see in a moment.
The numbering system requires that nothing ever be changed once it gets stored. So the big all-objects-database you find in a Git repository is completely read-only; all you ever do is add to it.1
Besides being numbered, each commit:
Holds a full snapshot of every file, in a compressed and (important for Git's internal operation and for your disk space needs) de-duplicated fashion. That is, when you make a new commit, if you have a million files but have only changed three of them, the new commit doesn't duplicate all 1 million files: it re-uses all but the new three.
These snapshots are like tar or WinZip archives, in a way, in that you can just have Git extract all the files from them any time you like. But they're not ordinary files: they're special Git-only compressed-and-de-duplicated things, that only Git can read, and nothing—not even Git itself—can overwrite. That's why they are safe to share across multiple commits.
Holds some metadata, such as the name and email address of the person who made the commit. Your log message goes in here too, when you make a commit. Crucially for Git's internal workings, Git adds, to this metadata, a list of previous commit hash IDs, which Git calls the parents of this commit.
Most commits just have a single parent. That is, most commits list one previous commit hash ID. This results in a simple backwards-looking chain. Suppose H
stands for the hash ID of your newest commit. We'll draw this with an arrow coming out of it, indicating that H
points to its parent commit (by storing the hash ID of the parent):
<-H
We'll call the parent commit hash ID G
for short, and draw in commit G
:
<-G <-H
Of course, G
is a commit too, so it has a list of parents; it's a typical commit, so it has just one parent, which we'll call F
, and we'll draw in F
:
... <-F <-G <-H
F
is a commit, with a parent, so it points back still further, and so on. By following this chain backwards, one commit at a time, Git can find every commit in the chain, all the way back to the very first commit ever. That commit (presumably A
in our simple example here) has an empty list of parent commit hash IDs, so that it doesn't point backwards at all, and that lets git log
stop going backwards.
So that's how git log
shows you the history. The history is nothing but the commits. The commits are the history; git log
starts wherever you are now (usually at the latest) and works backwards, one commit at a time.
There's just one nasty little problem right now, and that is: to use this, we'd have to memorize at least one Git commit hash ID. How are we going to find commit H
? How will we know it's the latest commit? This is where the branch names come in.
1Git will occasionally stick "junk" in here, and will clean it out on its own later. Calling it append-only is therefore a bit wrong technically. But unless you have truly enormous files (petabytes at a time) or are on ridiculously tiny storage quotas, you don't normally have to worry about this.
Let's draw our little diagram without bothering with the arrows this time (out of laziness):
...--G--H
We need a way to quickly find the random-looking hash ID H
. We'll have Git store that hash ID in a branch name, like main
or master
:
...--G--H <-- main
That is, the branch name main
will hold the raw hash ID of commit H
. We won't have to remember it ourselves: we'll have Git do that job, by having Git store the hash ID of H
under the name main
.
If we create another new branch name:
...--G--H <-- develop, main
then, right now, both names point to the same commit. But this is about to change, so now we need to know which of these names we're actually using to find commit H
. Let's say we use git switch develop
or git checkout develop
, so that we're using the name develop
, not the name main
, to find commit H
; we'll draw that like this:
...--G--H <-- develop (HEAD), main
Without (yet) explaining how Git goes about making the snapshot for a new commit, let's say we now make a new commit, which gets a new, totally-unique, big ugly random-looking hash ID, which we'll just call I
so that we don't have to guess it.
Commit I
will store a single snapshot of all files, plus some metadata. In the metadata for I
, Git will add our name and email address, our log message, the current date-and-time, and—so that history works—Git will set the (single) parent of new commit I
to be existing commit H
.
Git knows to use that hash ID because the name develop
, to which HEAD
is attached, currently points to commit H
. That is, we're on develop
, and develop
means "commit H
", so new commit I
should point back to existing commit I
. Git writes out the new commit metadata-and-snapshot and now we have:
I
/
...--G--H
Now Git does its sneaky trick. The name main
pointed to H
before, and still does. But Git, having allocated a new hash ID to new commit I
, makes the current name point to I
now:
I <-- develop (HEAD)
/
...--G--H <-- main
So now if we use the name develop
, we get commit I
, and if we use the name main
, we get back to commit H
. If we make another new commit, we'll have develop
pointing to the newest such commit, which will point backwards to now-existing commit I
, which will continue (forever) to point backwards to existing commit H
, and so on:
I--J <-- develop (HEAD)
/
...--G--H <-- main
Note that commits up through and including H
are on both branches, in Git's reckoning.
We noted above, at the beginning of this, that all Git commits are permanent (well, mostly2) and read-only (completely). Moreover, nothing can write to the files in a commit, and only Git itself can even read those files. So how are we ever supposed to get any work done?
The fact that the snapshot in a commit is like an archive gets us the first part of the answer. To check out a commit (with git checkout
or git switch
), Git will extract the archive. That is, Git will de-Git-ize and de-compress the data for each file and store it in an ordinary file: one the computer can read and write as usual. All the programs on your computer can deal with these files, as they're literally ordinary files.
These files go into what Git calls your working tree or work-tree. It's literally where you do your work. You don't work on/with the files that are in Git. You work on files that aren't in Git, that are instead extracted to your work area. Almost all version control systems (VCSes) work this way, for the simple reason that the VCS-ized saved files are in some internal format.
If Git were like most other version control systems, we'd stop here, with the two copies of each file from the current commit: one stored forever inside the commit, and one usable one. You'd work on the usable files and then use the "make new commit" action and Git would make the new commit from the updated files.
Git isn't like this. Instead, Git has another trick up its sleeves.3 Instead of keeping two copies of each file—the committed one, and the working one—Git keeps three copies of each file. Or rather, three "copies": in between the committed copy and the working copy, Git keeps an extra "copy", stored in the compressed-and-de-duplicated format, but not read-only. Because it's de-duplicated, this copy is initially shared with the committed copy. The de-duplication is invisible though, so we don't have to worry about it: we can just think of it as a third copy.
In other words, instead of just:
HEAD commit working tree
----------- ------------
Makefile Makefile
README.txt README.txt
main.py main.py
we have a third copy of each file, in what Git calls—in Git-y fashion—by three different names: the index, or the staging area, or (rarely these days) the cache. All three names refer to the same thing, and I'll use the name index here, but staging area is closer to the way you mostly use it, so feel free to use that name in your head:
HEAD commit index working tree
----------- ---------- ------------
Makefile Makefile Makefile
README.txt README.txt README.txt
main.py main.py main.py
When you change the working tree copy, nothing happens to the index copy. You must run git add
regularly; the git add
command means make the index copy match the working tree copy. Git will, at git add
time, read the working tree copy, compress it into the Git format, check for duplicates, and then:
and now either way, the index copy of the named file matches the working tree copy (and is pre-de-duplicated).
This means that the index copy is, at all times, ready to go into the next commit. Thus, what's in the staging area is what will go into the next commit. It is, in effect, the proposed next commit. You edit files in your working tree just to edit them, and then you use git add
to update your proposed next commit.
2A commit that you can't find will eventually go away for real. We'll see that in a bit when we talk more about git reset
.
3What kind of shirt or whatever does Git wear anyway?
Since the index holds, at all times, the proposed next commit, all git commit
has to do is:
Let's use our sample repository, which at this point looks like this:
I--J <-- develop (HEAD)
/
...--G--H <-- main
and watch the action as we git switch main
or git checkout main
, make a new branch name, switch to that new name, and then make a new commit:
git checkout main
or git switch main
: this
removes (from index and working tree) the current commit's files (the files from J
);
extract the main
-commit's files (from H
) into index and working tree; and
leaves us with this:
I--J develop
/
...--G--H <-- main (HEAD)
git checkout -b feature
or git switch -c feature
: this
creates a new branch name feature
pointing to H
;
switches to it: this would involve removing files from H
and installing files from H
, but Git sees that that's pointless and skips it;
leaves us with this:
I--J develop
/
...--G--H <-- feature (HEAD), main
We now modify some files in the working tree. Nothing happens to Git's index yet, but then we run git add
on those files, and now the versions in the index of the add
-ed files match.
If we like, we can create new files from scratch, and add those. Or we can completely remove a file entirely, with git rm
: that removes it from both the working tree and the index.
Now we run git commit
. Git packages up whatever is in the index right now and makes a new commit that updates the current branch name, i.e., feature
, so we end up with:
I--J develop
/
...--G--H <-- main
\
K <-- feature (HEAD)
Note how git commit
simply adds on to the drawing. No existing commit changes. If commit H
has some files in it that we removed when we made K
, that just means that commit K
lacks those files. They're still there in commit H
. It's the commits that matter. The commits hold the files. Find the commit, check it out, and you'll get the files.
git reset
With all the above in mind, we can now understand what git reset
does—or at least, what git reset --soft
, git reset --mixed
, and git reset --hard
do. The git reset
command is very big inside and can do a lot of other things too, if you want it to; we are only going to cover the basic three here.
Suppose we have made another commit:
I--J develop
/
...--G--H <-- main
\
K--L <-- feature (HEAD)
and we suddenly realize that commit L
was terrible for some reason: wrong snapshot, bad commit message, whatever. We have several options, but the easiest one is to use git commit --amend
. This command is a lie: it doesn't change commit L
, it just makes a new commit L'
that has commit K
as its parent:
I--J develop
/
...--G--H <-- main
\
K--L' <-- feature (HEAD)
\
L
Commit L
still exists. We just can't find it any more because we use the names, not the hash IDs. The name feature
now finds the "amended" commit L'
, not the original L
. But we won't talk here about using git commit --amend
; instead, we'll talk about using git reset
.
The git reset
command works by letting us move the current branch name. We can pick any commit, and make the name feature
point to that commit. For instance, we could pick commit G
if we wanted to. But let's pick commit K
, using HEAD^
or HEAD~
to find it.4 Any of our three git reset
commands, given the hash ID of commit K
or a name that finds the hash ID of commit K
, will do this:
I--J develop
/
...--G--H <-- main
\
K <-- feature (HEAD)
\
L
Commit L
still exists, but the name feature
now points to commit K
:
If we use git reset --soft HEAD^
, Git moves the branch name, and then stops: the index and working tree are still from commit L
.
If we use git reset --mixed HEAD^
or git reset HEAD^
(the default --mixed
), Git moves the branch name, yanks all the commit-L
files out of the index, and inserts into the index all the commit-K
files. Then reset stops here.
If we use git reset --hard HEAD^
, Git moves the branch name and yanks all the commit-L
files out of the index and our working tree, and installs into the index and our working tree the commit-K
files.
So this kind of git reset
can do up to three things:
The flags tell it when to stop: --soft
says to do step 1 and stop. The default is to do steps 1 and 2 and stop. The --hard
flag tells it to do all three steps.
If we like, we can git reset --hard HEAD
. That tells Git:
Because the commit we picked in step 1 is the commit the name already points to, the "move the branch" part was a no-op. The name didn't actually move anywhere. We used this git reset
for its steps 2 and 3. It still did step 1, it just didn't achieve anything by doing step 1.
We can use git reset HEAD
to make Git do nothing during step 1 and then reset the index, without touching the working tree. Note that if we leave out the commit hash ID—if we run git reset
or git reset --hard
—we get a mixed or hard reset that, in step 1, doesn't move the branch. But we're always doing step 1, even if it's just a big nothing.
4This syntax—the suffix ^
or ~
—is part of a whole series of ways Git has of specifying commits. Since the commit is the raison d'être of Git, there should be a lot of ways to name a commit, and there are. See the gitrevisions documentation for a complete list of ways to name Git internal objects (mainly commits, but you can name the others as well).
git reset
(i.e., mixed) is what you wantedUsing:
git reset HEAD^
you would have:
--mixed
suppresses step 3).You could now git add
each of the files you want to new or updated in the index, and not git add
any files you didn't want updated in the index. In other words, you could now do the same thing you normally do, all the time, with Git.
By using git reset --soft HEAD^
, you did step 1, but not step 2. So that meant you then had to adjust Git's index to not contain the files you didn't want to commit. That's also something you will do now and then in Git, but it's less common than git add
-ing files that you do want to commit. It's not harmful to do it "backwards", it's just easier to get wrong.
git update-index --assume-unchanged
is wrongGit always makes new commits from the files that are in Git's index. As such, the git status
command has, as one of its jobs, the job of telling you about files in your working tree that don't match the copies in Git's index.
That is, suppose you've modified three files from the contents they had earlier. Then you ran git add
on one of them. Let's list what's in each of the three "active" copies of each file, with a version number added. You started with:
HEAD commit index working tree
----------- ---------- ------------
Makefile(1) Makefile(1) Makefile(1)
README.txt(1) README.txt(1) README.txt(1)
main.py(1) main.py(1) main.py(1)
After modifying all three files in the working tree, you have:
HEAD commit index working tree
----------- ---------- ------------
Makefile(1) Makefile(1) Makefile(2)
README.txt(1) README.txt(1) README.txt(2)
main.py(1) main.py(1) main.py(2)
Now you run git add main.py
, forgetting to add Makefile
and README.txt
. You get:
HEAD commit index working tree
----------- ---------- ------------
Makefile(1) Makefile(1) Makefile(2)
README.txt(1) README.txt(1) README.txt(2)
main.py(1) main.py(2) main.py(2)
One of the jobs of git status
is to compare the index and working tree copies and complain if they don't match. The result is a complaint that, hey, you forgot to git add
those two files.
The index copy of each file has two special flags you can set:
These two flags have different purposes, but both of them are currently implemented the same in terms of git status
: they both make git status
not bother complaining about the files when they don't match.
Running git commit
at this point would make a new commit in which main.py
is updated, but README.txt
and Makefile
are not updated. In your case, the problem was that you added new .angular/cache/*
files to Git's index. Setting "assume unchanged" on those requires that there be some copy of each of those files in Git's index (you cannot set these flags on files that aren't in Git's index). But you want each commit you make to lack these files entirely. You want the files to not be in Git's index.
.gitignore
filesListing files in a .gitignore
does not affect whether the files are in Git's index. A file that is in Git's index right now, regardless of why it's there, makes that file a tracked file. Git's git status
command will (in the absence of assume-unchanged or skip-worktree) complain about files that are in Git's index and your working tree and don't match. It doesn't matter whether these files are listed in a .gitignore
or not: the files are tracked, so they'll be in the next commit, so Git will complain if they don't match.
What listing files in .gitignore
does do is suppress a different complaint. Suppose you have some file xyzzy
in your working tree right now. (Maybe you made it by pasting something you wanted to remember into a file, with the intent of removing it as soon as you've taken care of whatever it is.) Suppose further that this file isn't in Git's index—it won't be in your next commit—and it shouldn't be in your index commit, or any commit. Its presence in your working tree, though, will make git status
complain that xyzzy
is an untracked file.
An untracked file is, by definition, any file that exists in your working tree, but not in Git's index. (Any file that is in Git's index is a tracked file.) And git status
complains about these, and you can't set any index-entry flags for these because they're not in Git's index on purpose. So there needs to be a way to stop git status
from complaining—and that's the first part of what a .gitignore
entry does.
Listing a file name or pattern in a .gitignore
tells git status
, hey, shut up about these files when they're untracked, I don't want to hear it, it's on purpose. To help out with git add
, listing those files also means and when they're untracked, if I use an en-masse "add all files" command like git add .
, don't add them either, and that's the second part of what a .gitignore
entry does.
What all this means is that .gitignore
is the wrong name for the file. It should be .git-do-not-complain-about-these-files-when-they-are-untracked-and-do-not-auto-add-them-with-en-masse-git-add-commands-either
. But that's a ridiculous name to type, so .gitignore
it is.
.gitignore
files, but be aware that that's not sufficient on its own: you have to make sure you haven't already added them.git reset
command moves the current branch name and then optionally resets the index and the working tree.The last part is what makes git reset
a dangerous command: if you can't find a commit, what good is it? Also, git reset --hard
erases stuff from your working tree, and working tree files are not in Git. They may have come out of a commit (in which case you can get them again, from what same commit), but they may not have (e.g., if you spent all day updating them and haven't committed yet).
Upvotes: 2
Reputation: 2105
Adding your files to .gitignore alone is not enough.
You should do this:
git update-index --assume-unchanged <file_path>
and add your files to .gitignore
If you want to do this to a directory, open that directory in your shell (using cd):
and execute this:
git update-index --assume-unchanged $(git ls-files | tr '\n' ' ')
Upvotes: 0