Reputation: 11
Beginner user here trying to learn git. I was writing a simple program, using checkout -b
and commit -m
with a note, to 'save' my progress along the way. I used status and branch to check and see that the branches were being made, but I closed the terminal. Now I use git status
and it says I'm on a branch but lists 'no commits yet' underneath. The JS I've been writing along the way is still there.
Should I be using commit A every time I use commit M? I can't tell if I'm having an issue because I'm starting up wrong or saving my branches wrong.
Upvotes: 1
Views: 2654
Reputation: 488213
Your files are just in your work-tree. You have not yet committed anything! There's no harm done, you can add and commit your files now. But you will definitely have to use git add
first.
As a beginner using Git, you will be starting in one of two "modes", as it were, and they require somewhat different approaches. There is a unifying theme underneath this but it helps to start without it! Note: this is pretty long, but I recommend you read through it all.
In this mode, you need to create a directory and enter it—you may have done this already—and then run git init
:
$ mkdir new-project
$ cd new-project
(you might have done these steps already)
$ git init
Initialized empty Git repository in .../new-project/.git
You can now create files in the new-project
directory that you are already in, or you might already have some files there. If you run git status
you will get:
$ git status
On branch master
No commits yet
nothing to commit (create/copy files and use "git add" to track)
(older versions of Git don't say "no commits yet" as they print a longer and stupider message that means the same thing, but is confusing.)
In this mode, there is no need to run git checkout -b
. It won't really hurt if you do, but there is a weird thing going on: you're on branch master
but branch master
does not exist yet. If you use git checkout -b
to change which branch you're on, you'll continue to have no branches. You'll be "on" whichever branch you select, and it will still not exist!
Having created some files, you must now run git add
. The git add
step copies the files to the staging area—a hidden file inside the .git
directory, really—so that they will be in your next commit. You cannot use git commit -a
here as we will see in a moment.
If you run git commit
without first running git add
, the commit attempt will fail:
$ echo I am a readme file > README
$ ls
README
$ git commit -m "a bad commit message"
On branch master
Initial commit
Untracked files:
README
nothing added to commit but untracked files present
$ git status
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
README
nothing added to commit but untracked files present (use "git add" to track)
Note that git status
still says No commits yet
. The git commit
command did not do anything!
There is no harm done here, as you can now git add
your files and git commit
them. That's what you need to do in this particular case: just open a terminal window again, navigate to your project (cd new-project
or whatever), and then use git add
and git commit
.
In this mode, you start with:
git clone <url>
where the url
part comes from someone else, perhaps GitHub for instance:
$ git clone https://github.com/git/git
Cloning into 'git'...
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Total 273170 (delta 1), reused 1 (delta 1), pack-reused 273168
Receiving objects: 100% (273170/273170), 122.67 MiB | 3.53 MiB/s, done.
Resolving deltas: 100% (202836/202836), done.
$
At this point you're not in the project yet, and must still navigate into it:
$ cd git
Now you have an existing project—in this case, the source code for Git—with a lot of existing commits holding a lot of existing files. The output from running ls
here is long and I won't show it at all.
The key difference is that at this point, you have:
origin/master
You can now modify files and use git commit -a
, if you like (though I recommend against it). You can use git checkout -b
to create a new branch first, and this one actually does create a new branch.
Actually, they're not as different as they might look at first. The real key is that the git clone
-d repository has some commits in it.
To Git beginners, it might seem that Git is all about branches. It's not, really—it's all about commits. Branches—or really, branch names like master
—are important because they help Git, and thus also help you, find commits. It might also seem that Git is all about files, but it's not: it's about commits. Commits hold files, but really, the commit is the basic unit of Git.
It's also worth noting here that git clone
really means:
mkdir
and then enter the directory and run git init
, to make a new, totally-empty repository.origin
, though you can choose another one if you want.git fetch
, to obtain commits from some other Git. There needs to be a Git listening at that URL. That Git has commits; that Git gives you their commits. That Git has branch names that that Git is using to find those commits, so that Git gives you its branch names too. That Git may have tag names that it can also use to find commits; if so, that Git gives you their tag names.master
becomes your origin/master
; their maint
becomes your origin/maint
; and so on. (If that Git gives you some tag names, your Git just takes them as-is.)master
. Which one to pick is a little tricky. You can tell your Git, at git clone
time, which one to pick, including tag names. If you don't pick one, their Git is supposed to recommend a particular branch name—never a tag name—and your Git will pick that one. If all else fails, your Git picks master
.As its final step, your git clone
takes the name picked by the last step above and creates—or tries to create—that branch name in your repository, matching the one it got from the other Git. If we assume the name is master
, you now have a master
branch that matches your origin/master
, and some commits. This creating-your-master is done via git checkout
: your Git checks out their master
branch's commit—which is your origin/master
—and then adds your own name master
to identify the same commit. We'll see more about this below.
There is one weird case: if you clone a totally-empty repository—one that has no commits—it won't have any branch or tag names either, and your Git will be unable to get a recommended branch or tag name from them and be unable to create a master
branch. This puts your Git back in the situation you had in mode #1!
In other words, the real difference is whether you have some commits to start with. Git is all about commits, and with no commits, Git is kind of helpless and useless. The very first commit is really important: not its contents, not that much, but its existence. Until you have a first commit, you cannot have any branches or tags.
One of the big secrets—ok, not actually secret at all—to understanding Git is to learn that a branch name like master
just holds the hash ID of one single Git commit. Every Git commit has a big ugly hash ID, such as 745f6812895b31c02b29bdfe4ae8e5498f776c26
. This hash ID is unique to this particular commit. No other commit can have this ID. But who wants to try to remember 745f6-something-something-whatever? We have a computer, why don't we have the computer remember it, and give it a nice simple name, like, say, master
?
That's precisely what a branch name is and does: it's a nice simple name that remembers some big ugly hash ID for us. What makes branch names special, though, is that they don't just remember some hash ID. Instead, they remember the latest hash ID.
As you make new commits, Git will move your branch name—the name of whichever branch you have checked-out—so that it remembers the hash ID of the last commit. Each new commit you make will automatically remember the hash ID of the previous commit. So, by starting at the end and working backwards, Git can find its way to each previous commit:
... <-F <-G <-H <--master
If H
stands in for the hash ID of the last commit, and G
stands in for H
's remembered "previous commit" hash ID, we say that H
points to G
. The name master
points to H
. From H
, Git can find G
; from G
, Git can find F
. This repeats until we get all the way back to that very first commit. The first commit can't point to any earlier commit, so it just doesn't.
To make a new commit—let's call it I
—we have Git write out all our files into a new commit, add our log message and our user name and all that stuff, and set I
's previous commit to be commit H
, so that I
points back to H
:
...--F--G--H <-- master
\
I
Then all Git has to do is to write the hash ID of I
into the name master
, so that master
now points to I
, and the commit is done and incorporated into our Git repository:
...--F--G--H
\
I <-- master
When you make new commits, the existing commits stay put. The new commit points back to some previous, existing commit. The branch name changes to point to the new commit.
When you start with a totally-empty repository, though, there are no commits for any names to point to. Git requires every branch name to point to some existing commit, so no names are allowed to exist either. This is why Git is a little weird and useless on a new, totally-empty repository: with no commits to point-to, no branches can exist! You must make that first commit. From then on, you can have as many branch names as you want—though if there's only one commit, every branch name will point to that one commit!
A <-- branch1, branch2, branch3, ..., branchN, master
Here, all branches just have the one commit, which we're calling A
, but which actually has some big ugly hash ID as its real name.
Note that you can remove branch names at any time, too. If we have the above, and remove all the branchnumber
names, we get back to a much more sensible:
A <-- master
and now there's just one branch master
holding the one commit that is in this repository.
Remote-tracking names work the same way, except that instead of your Git moving them, the other Git moves its branch names. You then have your Git call up their Git, find out where their branch names have moved, and have your Git move your remote-tracking names to match:
A <-- master, origin/master
If they add a new commit B
, their master
will point to this new B
. You run git fetch
and you get:
A <-- master
\
B <-- origin/master
because your Git sees that their master
now identifies (new) commit B
, so your Git brings over commit B
to your repository, then updates your remote-tracking name origin/master
.
origin
, your remote-tracking names origin/*
remember their branch names, which remember the last commit in their branch.git checkout
.git checkout -b
will change the name of the branch that you're "on".So this is how you will bootstrap your initial, totally-empty repository: by making that first commit, you'll create whatever branch you chose to be on. The usual normal way to do this is to let that first branch be called master
. (Some external tools, and/or some people, will get confused if you make your first branch something else: everyone expects master
. If you call it the-spanish-inquisition
....)
We noted above that while Git is all about commits, each commit holds some set of files. In fact, each commit represents a full snapshot of every file—well, of every file that goes in that commit, but that seems kind of redundant. We didn't note—but it's important—that everything in every commit is frozen for all time. Nothing about any existing commit can ever be changed! The user name and email address that Git put in that commit, the log message you entered, the frozen copies of every file in the snapshot: all of these are quite impossible to change.1 The commit with that hash ID always contains those copies of files. If you have that commit, you have that version of all the files: you have that snapshot.
Now, obviously, if Git made new copies of every file every time you make a new commit, the repository would rapidly become terribly fat. So Git doesn't actually do that. When Git freezes copies of your files, it uses a couple of tricks. The first trick is that it compresses the file.2 So what's inside Git is sort of a dehydrated version of your file, that's also frozen for all time. I like to call this a freeze-dried copy of the file. The second and perhaps more important trick is that, having made this freeze-dried copy that can never be changed, Git can just keep referring back to the existing copy in every new commit that re-uses that file unchanged.
Hence, if you have six commits—six snapshots—of your files, but five of the six have the same contents for the README
file, all five of those snapshots share one underlying freeze-dried README
, and the sixth has its own currently-private freeze-dried copy. If a later commit wants to use the same data for any file—even if it's not named README
—Git can just re-use one of those freeze-dried copies.
This gives us a clearer picture of what a commit is:
Putting all of this stuff together is what gets us the hash ID of the commit.
There are a couple of problems, though. The first and most obvious is: freeze-dried copies of files that only Git can read—and never change—are not going to help us get any actual work done.
1The reason for this ties into the unique hash ID for each commit. The hash ID of a commit is actually a cryptographic checksum of its contents. This means that if you take a commit out, make some change to the contents, and try to put the result back, what you get is a new and different commit with a new, unique hash ID. The original commit is still there, untouched! All you did was add a new commit. This becomes particularly important later (but not in this answer).
2Technically, what Git does is to use zlib deflate on the object, rather than on the file itself. The object is a Git blob that holds the file's data. This is not really all that crucial, although as a happy side effect, it also protects against using the one known "shattered" PDF file SHA-1 collision.
git add
In order to get any work done, Git needs to give you the ability to extract any given commit. The place where you extract the commit is your work-tree or working tree, or some variation on this name. You run git checkout
, and give it a hash ID or a branch name—something that finds some particular commit in your repository—and your Git fills in your work-tree from that commit.
In a new, totally-empty Git repository, you still have a work-tree. You can do whatever you want in this work-tree. Git just creates it as an initial, empty—well, mostly-empty—directory or folder (whichever term you prefer). In your work-tree, Git hides a sub-directory / sub-folder named .git
, and Git keeps all of its files in this .git
directory.3
Note that other than filling it up with files (and sub-directories if needed to hold files), and copying files back out of it when you use git add
, Git doesn't actually do anything with your work-tree. The work-tree is for you, not for Git. This is where things get a little weird.
In other version control systems, such as Mercurial (hg
), you work in your work-tree and then you run hg commit
. The VCS scans your work-tree to see what you changed, and makes a new commit with the changes from the work-tree. Git doesn't do that—it doesn't store changes, and it doesn't scan the work-tree.
If you had just the repository itself plus the work-tree, you'd have two copies of each of the files from the current commit: the frozen copy that Git used to fill the work-tree when you did a git checkout
, and the normal-format copy in the work-tree. That's how hg checkout
works and it all makes perfect sense. If you want to change the file, you just change it in the work-tree. The work-tree is your proposed next commit. But that's not what Git does.
Instead, Git adds a third copy of each file. This third copy goes into what Git calls, variously, the index, or the staging area, or (rarely these days) the cache. These three names all refer to the same thing.4 The index holds the freeze-dried copy of the file, just like the commit. This freeze-dried copy is automatically ready to go into any new commit that you might make.
When you run git add
, Git reads and compresses—i.e., freeze-dries—the work-tree copy of the file and stuffs that into the index / staging-area. So git add
makes the file ready to commit again. If the file wasn't in the index before, now it is. If it was in the index before, it's been replaced. Either way, the file is now ready to be committed.
This means that when you run git commit
, all Git has to do is package the pre-frozen files that are already in the index into a new commit. That's (mostly) why, if you have a big project and run git commit
, it happens almost instantly—but if you're using Mercurial and run hg commit
, it takes seconds before it responds. Mercurial has to scan the work-tree. Git can ignore the work-tree.5
This index or staging area—whichever name you prefer—is, in effect, the proposed next commit. It's in a form that makes git commit
go fast. But this puts a burden on you, the programmer: you have to run git add
to copy files into the index / staging-area before you commit them.
Git does offer git commit -a
. What this does, in effect, is have git commit
run git add -u
first. The git add -u
option tells git add
that it should read through the index / staging-area. For each file that is already in the index, git add -u
checks the work-tree copy to see if it needs to be git add
ed to overwrite the existing index copy. But files that are not in the index are what Git calls untracked.
This is very important so let's say it again: In Git, an untracked file is a file that is not in the index. You can put a file into the index using git add
, and you can take a file out of the index using git rm
.
Whenever a file not in the index, it is untracked, and it will not be in the next commit. However, it may still be in some existing commits. If you git checkout
this old commit, Git will copy the file into the index, and now it is in the index and is tracked. If you then git checkout
the new commit that doesn't have the file, Git will remove the file from the index and from the work-tree, and the file will no longer be tracked, but also will be gone from the work-tree! If you have a configuration file, and don't want it to be in every commit, try not to put it in any commit.
Again, git commit -a
means scan the index, and for each file that is there—is tracked—check to see if a git add
will update the index copy; if so, do it. This means that git commit -a
makes Git work a lot like hg commit
. Mercurial requires that you hg add
a new file, to put the file into what Mercurial calls its manifest; hg commit
, and other hg
commands, use the manifest to know which work-tree files are tracked and which ones are untracked. Git uses the index, rather than the manifest, to know which work-tree files are tracked.
It's tempting to try to ignore the existence of the index, by using git commit -a
all the time. I recommend that you do not do this. Git's index is ridiculously important later—it plays a huge role in git merge
when there are conflicts—and you'll need to know about it. Don't try to ignore it! If you ignore it, it will come out and bite you when you least expect it.
3Try not to touch most of the files inside the .git
directory, at least not at first: Git may not be able to recover from things you change here. You can look at it all you like, though. Note in particular that there's a file .git/config
that contains your configuration data—the stuff that git config
manipulates. You can run git config --edit
to edit it in whatever editor you like, which is pretty handy if you have used git config
to do something long and had a typo in it.
4The actual implementation is mostly just a file named .git/index
. This file is binary and has a kind of complicated internal format. What's in the index, for each file, isn't the file itself, but rather the hash ID of an internal Git blob object. Initially, the blob object is the one from the commit. Whenever you update the copy of the file in the index, Git actually writes out a new blob object—or reuses an existing one, if there is one—then makes the index entry refer to the new-or-reused blob.
5There is actually a lot more to it than this, but this makes for a good mental model, I think. Git really gets its speed from a lot of other tricky optimizations, many of which could be put into Mercurial but haven't been, and from being compiled rather than interpreted Python, and a lot of other things. But at a sort of fundamental level, Mercurial is required to look at the work-tree—the work-tree is the proposed next commit—and Git isn't because the index is the proposed next commit, and that means Git's job is always going to be easier.
git add
;git add
to copy files from the work-tree into the staging area. Use git rm
to remove files from both the work-tree and the staging area.git commit
will simply package up whatever is in the staging area right now. Adding -a
will make it run git add -u
first, and git add -u
will only look at tracked files.git checkout
can check out any existing commit. To do so, it will copy all the files from that commit into both the index/staging-area, and the work-tree.git checkout
a branch name, that branch name is now the current branch, and git commit
will make the branch name identify the new commit you just made. The commit you had out earlier will be the parent of the new commit.(There's actually a way to get into that special mode even when you do have a bunch of branches, using git checkout --orphan
, but let's avoid that topic here as this is already plenty long.)
Upvotes: 1
Reputation: 2974
As Daniel A. White said, you have to add files before you commit.
Before you can create a commit you have to tell to git which files you want to include in the commit (Aka move it to the staging area). You can git add .
(prepare the content staged for the next commit - it is safer to to use the command git add -u
which adds only the files which are already tracked - thanks to knittl to pointing this out) and commit -m "some commit message"
or git commit -am "message"
(this command automatically stages the files and creates a commit with the given message)
Upvotes: 2