trinqueryan
trinqueryan

Reputation: 11

How to continue in Git after closing terminal?

Beginner user here trying to learn git. I was writing a simple program, using checkout -b and commit -m with a note, to 'save' my progress along the way. I used status and branch to check and see that the branches were being made, but I closed the terminal. Now I use git status and it says I'm on a branch but lists 'no commits yet' underneath. The JS I've been writing along the way is still there.

Should I be using commit A every time I use commit M? I can't tell if I'm having an issue because I'm starting up wrong or saving my branches wrong.

Upvotes: 1

Views: 2654

Answers (2)

torek
torek

Reputation: 488213

Your files are just in your work-tree. You have not yet committed anything! There's no harm done, you can add and commit your files now. But you will definitely have to use git add first.

Long

As a beginner using Git, you will be starting in one of two "modes", as it were, and they require somewhat different approaches. There is a unifying theme underneath this but it helps to start without it! Note: this is pretty long, but I recommend you read through it all.

Mode #1: you are starting on your own, without cloning some repository

In this mode, you need to create a directory and enter it—you may have done this already—and then run git init:

$ mkdir new-project
$ cd new-project

(you might have done these steps already)

$ git init
Initialized empty Git repository in .../new-project/.git

You can now create files in the new-project directory that you are already in, or you might already have some files there. If you run git status you will get:

$ git status
On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)

(older versions of Git don't say "no commits yet" as they print a longer and stupider message that means the same thing, but is confusing.)

In this mode, there is no need to run git checkout -b. It won't really hurt if you do, but there is a weird thing going on: you're on branch master but branch master does not exist yet. If you use git checkout -b to change which branch you're on, you'll continue to have no branches. You'll be "on" whichever branch you select, and it will still not exist!

Having created some files, you must now run git add. The git add step copies the files to the staging area—a hidden file inside the .git directory, really—so that they will be in your next commit. You cannot use git commit -a here as we will see in a moment.

If you run git commit without first running git add, the commit attempt will fail:

$ echo I am a readme file > README
$ ls
README
$ git commit -m "a bad commit message"
On branch master

Initial commit

Untracked files:
        README

nothing added to commit but untracked files present
$ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        README

nothing added to commit but untracked files present (use "git add" to track)

Note that git status still says No commits yet. The git commit command did not do anything!

There is no harm done here, as you can now git add your files and git commit them. That's what you need to do in this particular case: just open a terminal window again, navigate to your project (cd new-project or whatever), and then use git add and git commit.

Mode #2: you are starting by cloning someone else's repository

In this mode, you start with:

git clone <url>

where the url part comes from someone else, perhaps GitHub for instance:

$ git clone https://github.com/git/git
Cloning into 'git'...
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Total 273170 (delta 1), reused 1 (delta 1), pack-reused 273168
Receiving objects: 100% (273170/273170), 122.67 MiB | 3.53 MiB/s, done.
Resolving deltas: 100% (202836/202836), done.
$ 

At this point you're not in the project yet, and must still navigate into it:

$ cd git

Now you have an existing project—in this case, the source code for Git—with a lot of existing commits holding a lot of existing files. The output from running ls here is long and I won't show it at all.

The key difference is that at this point, you have:

  • a branch (probably master)
  • some number of remote-tracking names, such as origin/master
  • lots of tracked files in your work-tree

You can now modify files and use git commit -a, if you like (though I recommend against it). You can use git checkout -b to create a new branch first, and this one actually does create a new branch.

Why these are so different?

Actually, they're not as different as they might look at first. The real key is that the git clone-d repository has some commits in it.

To Git beginners, it might seem that Git is all about branches. It's not, really—it's all about commits. Branches—or really, branch names like master—are important because they help Git, and thus also help you, find commits. It might also seem that Git is all about files, but it's not: it's about commits. Commits hold files, but really, the commit is the basic unit of Git.

It's also worth noting here that git clone really means:

  • Run mkdir and then enter the directory and run git init, to make a new, totally-empty repository.
  • Add the URL under a name. The standard name here is origin, though you can choose another one if you want.
  • Use the name and its stored URL to run git fetch, to obtain commits from some other Git. There needs to be a Git listening at that URL. That Git has commits; that Git gives you their commits. That Git has branch names that that Git is using to find those commits, so that Git gives you its branch names too. That Git may have tag names that it can also use to find commits; if so, that Git gives you their tag names.
  • Rename all of their branch names. These become your remote-tracking names: their master becomes your origin/master; their maint becomes your origin/maint; and so on. (If that Git gives you some tag names, your Git just takes them as-is.)
  • Pick one of their branch (or tag) names, usually master. Which one to pick is a little tricky. You can tell your Git, at git clone time, which one to pick, including tag names. If you don't pick one, their Git is supposed to recommend a particular branch name—never a tag name—and your Git will pick that one. If all else fails, your Git picks master.

As its final step, your git clone takes the name picked by the last step above and creates—or tries to create—that branch name in your repository, matching the one it got from the other Git. If we assume the name is master, you now have a master branch that matches your origin/master, and some commits. This creating-your-master is done via git checkout: your Git checks out their master branch's commit—which is your origin/master—and then adds your own name master to identify the same commit. We'll see more about this below.

There is one weird case: if you clone a totally-empty repository—one that has no commits—it won't have any branch or tag names either, and your Git will be unable to get a recommended branch or tag name from them and be unable to create a master branch. This puts your Git back in the situation you had in mode #1!

In other words, the real difference is whether you have some commits to start with. Git is all about commits, and with no commits, Git is kind of helpless and useless. The very first commit is really important: not its contents, not that much, but its existence. Until you have a first commit, you cannot have any branches or tags.

A branch name identifies a commit

One of the big secrets—ok, not actually secret at all—to understanding Git is to learn that a branch name like master just holds the hash ID of one single Git commit. Every Git commit has a big ugly hash ID, such as 745f6812895b31c02b29bdfe4ae8e5498f776c26. This hash ID is unique to this particular commit. No other commit can have this ID. But who wants to try to remember 745f6-something-something-whatever? We have a computer, why don't we have the computer remember it, and give it a nice simple name, like, say, master?

That's precisely what a branch name is and does: it's a nice simple name that remembers some big ugly hash ID for us. What makes branch names special, though, is that they don't just remember some hash ID. Instead, they remember the latest hash ID.

As you make new commits, Git will move your branch name—the name of whichever branch you have checked-out—so that it remembers the hash ID of the last commit. Each new commit you make will automatically remember the hash ID of the previous commit. So, by starting at the end and working backwards, Git can find its way to each previous commit:

... <-F <-G <-H   <--master

If H stands in for the hash ID of the last commit, and G stands in for H's remembered "previous commit" hash ID, we say that H points to G. The name master points to H. From H, Git can find G; from G, Git can find F. This repeats until we get all the way back to that very first commit. The first commit can't point to any earlier commit, so it just doesn't.

To make a new commit—let's call it I—we have Git write out all our files into a new commit, add our log message and our user name and all that stuff, and set I's previous commit to be commit H, so that I points back to H:

...--F--G--H   <-- master
            \
             I

Then all Git has to do is to write the hash ID of I into the name master, so that master now points to I, and the commit is done and incorporated into our Git repository:

...--F--G--H
            \
             I   <-- master

When you make new commits, the existing commits stay put. The new commit points back to some previous, existing commit. The branch name changes to point to the new commit.

When you start with a totally-empty repository, though, there are no commits for any names to point to. Git requires every branch name to point to some existing commit, so no names are allowed to exist either. This is why Git is a little weird and useless on a new, totally-empty repository: with no commits to point-to, no branches can exist! You must make that first commit. From then on, you can have as many branch names as you want—though if there's only one commit, every branch name will point to that one commit!

A   <-- branch1, branch2, branch3, ..., branchN, master

Here, all branches just have the one commit, which we're calling A, but which actually has some big ugly hash ID as its real name.

Note that you can remove branch names at any time, too. If we have the above, and remove all the branchnumber names, we get back to a much more sensible:

A   <-- master

and now there's just one branch master holding the one commit that is in this repository.

Remote-tracking names work the same way, except that instead of your Git moving them, the other Git moves its branch names. You then have your Git call up their Git, find out where their branch names have moved, and have your Git move your remote-tracking names to match:

A   <-- master, origin/master

If they add a new commit B, their master will point to this new B. You run git fetch and you get:

A   <-- master
 \
  B   <-- origin/master

because your Git sees that their master now identifies (new) commit B, so your Git brings over commit B to your repository, then updates your remote-tracking name origin/master.

What to know so far

  • Branch names find the last commit in a branch. If you have another Git you're calling origin, your remote-tracking names origin/* remember their branch names, which remember the last commit in their branch.
  • Each commit points backwards to its previous ("parent") commit. The first commit is kind of special, because there isn't a previous commit.
  • The set of branches that hold some commit can change when you add or remove branch names.
  • Making a new commit updates the current branch name—the one you gave to git checkout.
  • Until you have at least one commit, no branch names can actually exist, even though git checkout -b will change the name of the branch that you're "on".
  • If you're on a branch that doesn't exist—in a new, totally empty repository, for instance—making a new commit will make that branch spring into existence.

So this is how you will bootstrap your initial, totally-empty repository: by making that first commit, you'll create whatever branch you chose to be on. The usual normal way to do this is to let that first branch be called master. (Some external tools, and/or some people, will get confused if you make your first branch something else: everyone expects master. If you call it the-spanish-inquisition....)

A clearer picture of commits

We noted above that while Git is all about commits, each commit holds some set of files. In fact, each commit represents a full snapshot of every file—well, of every file that goes in that commit, but that seems kind of redundant. We didn't note—but it's important—that everything in every commit is frozen for all time. Nothing about any existing commit can ever be changed! The user name and email address that Git put in that commit, the log message you entered, the frozen copies of every file in the snapshot: all of these are quite impossible to change.1 The commit with that hash ID always contains those copies of files. If you have that commit, you have that version of all the files: you have that snapshot.

Now, obviously, if Git made new copies of every file every time you make a new commit, the repository would rapidly become terribly fat. So Git doesn't actually do that. When Git freezes copies of your files, it uses a couple of tricks. The first trick is that it compresses the file.2 So what's inside Git is sort of a dehydrated version of your file, that's also frozen for all time. I like to call this a freeze-dried copy of the file. The second and perhaps more important trick is that, having made this freeze-dried copy that can never be changed, Git can just keep referring back to the existing copy in every new commit that re-uses that file unchanged.

Hence, if you have six commits—six snapshots—of your files, but five of the six have the same contents for the README file, all five of those snapshots share one underlying freeze-dried README, and the sixth has its own currently-private freeze-dried copy. If a later commit wants to use the same data for any file—even if it's not named README—Git can just re-use one of those freeze-dried copies.

This gives us a clearer picture of what a commit is:

  • It has some metadata to say who made it and when: author and committer, name and email address, and two time-stamps for when-authored and when-committed.
  • It has the hash ID of its predecessor or parent commit.
  • It stores freeze-dried copies of every file that goes with that particular commit. They might be shared with other commits, but there's a full copy of every file for this particular snapshot. But they're in a freeze-dried format that nothing but Git can use.
  • It has any arbitrary metadata you'd like to see later, say, in a year or two when you look back at this commit: the log message.

Putting all of this stuff together is what gets us the hash ID of the commit.

There are a couple of problems, though. The first and most obvious is: freeze-dried copies of files that only Git can read—and never change—are not going to help us get any actual work done.


1The reason for this ties into the unique hash ID for each commit. The hash ID of a commit is actually a cryptographic checksum of its contents. This means that if you take a commit out, make some change to the contents, and try to put the result back, what you get is a new and different commit with a new, unique hash ID. The original commit is still there, untouched! All you did was add a new commit. This becomes particularly important later (but not in this answer).

2Technically, what Git does is to use zlib deflate on the object, rather than on the file itself. The object is a Git blob that holds the file's data. This is not really all that crucial, although as a happy side effect, it also protects against using the one known "shattered" PDF file SHA-1 collision.


How new commits get made, or why you have to run git add

In order to get any work done, Git needs to give you the ability to extract any given commit. The place where you extract the commit is your work-tree or working tree, or some variation on this name. You run git checkout, and give it a hash ID or a branch name—something that finds some particular commit in your repository—and your Git fills in your work-tree from that commit.

In a new, totally-empty Git repository, you still have a work-tree. You can do whatever you want in this work-tree. Git just creates it as an initial, empty—well, mostly-empty—directory or folder (whichever term you prefer). In your work-tree, Git hides a sub-directory / sub-folder named .git, and Git keeps all of its files in this .git directory.3

Note that other than filling it up with files (and sub-directories if needed to hold files), and copying files back out of it when you use git add, Git doesn't actually do anything with your work-tree. The work-tree is for you, not for Git. This is where things get a little weird.

In other version control systems, such as Mercurial (hg), you work in your work-tree and then you run hg commit. The VCS scans your work-tree to see what you changed, and makes a new commit with the changes from the work-tree. Git doesn't do that—it doesn't store changes, and it doesn't scan the work-tree.

If you had just the repository itself plus the work-tree, you'd have two copies of each of the files from the current commit: the frozen copy that Git used to fill the work-tree when you did a git checkout, and the normal-format copy in the work-tree. That's how hg checkout works and it all makes perfect sense. If you want to change the file, you just change it in the work-tree. The work-tree is your proposed next commit. But that's not what Git does.

Instead, Git adds a third copy of each file. This third copy goes into what Git calls, variously, the index, or the staging area, or (rarely these days) the cache. These three names all refer to the same thing.4 The index holds the freeze-dried copy of the file, just like the commit. This freeze-dried copy is automatically ready to go into any new commit that you might make.

When you run git add, Git reads and compresses—i.e., freeze-dries—the work-tree copy of the file and stuffs that into the index / staging-area. So git add makes the file ready to commit again. If the file wasn't in the index before, now it is. If it was in the index before, it's been replaced. Either way, the file is now ready to be committed.

This means that when you run git commit, all Git has to do is package the pre-frozen files that are already in the index into a new commit. That's (mostly) why, if you have a big project and run git commit, it happens almost instantly—but if you're using Mercurial and run hg commit, it takes seconds before it responds. Mercurial has to scan the work-tree. Git can ignore the work-tree.5

This index or staging area—whichever name you prefer—is, in effect, the proposed next commit. It's in a form that makes git commit go fast. But this puts a burden on you, the programmer: you have to run git add to copy files into the index / staging-area before you commit them.

Git does offer git commit -a. What this does, in effect, is have git commit run git add -u first. The git add -u option tells git add that it should read through the index / staging-area. For each file that is already in the index, git add -u checks the work-tree copy to see if it needs to be git added to overwrite the existing index copy. But files that are not in the index are what Git calls untracked.

This is very important so let's say it again: In Git, an untracked file is a file that is not in the index. You can put a file into the index using git add, and you can take a file out of the index using git rm.

Whenever a file not in the index, it is untracked, and it will not be in the next commit. However, it may still be in some existing commits. If you git checkout this old commit, Git will copy the file into the index, and now it is in the index and is tracked. If you then git checkout the new commit that doesn't have the file, Git will remove the file from the index and from the work-tree, and the file will no longer be tracked, but also will be gone from the work-tree! If you have a configuration file, and don't want it to be in every commit, try not to put it in any commit.

Again, git commit -a means scan the index, and for each file that is there—is tracked—check to see if a git add will update the index copy; if so, do it. This means that git commit -a makes Git work a lot like hg commit. Mercurial requires that you hg add a new file, to put the file into what Mercurial calls its manifest; hg commit, and other hg commands, use the manifest to know which work-tree files are tracked and which ones are untracked. Git uses the index, rather than the manifest, to know which work-tree files are tracked.

It's tempting to try to ignore the existence of the index, by using git commit -a all the time. I recommend that you do not do this. Git's index is ridiculously important later—it plays a huge role in git merge when there are conflicts—and you'll need to know about it. Don't try to ignore it! If you ignore it, it will come out and bite you when you least expect it.


3Try not to touch most of the files inside the .git directory, at least not at first: Git may not be able to recover from things you change here. You can look at it all you like, though. Note in particular that there's a file .git/config that contains your configuration data—the stuff that git config manipulates. You can run git config --edit to edit it in whatever editor you like, which is pretty handy if you have used git config to do something long and had a typo in it.

4The actual implementation is mostly just a file named .git/index. This file is binary and has a kind of complicated internal format. What's in the index, for each file, isn't the file itself, but rather the hash ID of an internal Git blob object. Initially, the blob object is the one from the commit. Whenever you update the copy of the file in the index, Git actually writes out a new blob object—or reuses an existing one, if there is one—then makes the index entry refer to the new-or-reused blob.

5There is actually a lot more to it than this, but this makes for a good mental model, I think. Git really gets its speed from a lot of other tricky optimizations, many of which could be put into Mercurial but haven't been, and from being compiled rather than interpreted Python, and a lot of other things. But at a sort of fundamental level, Mercurial is required to look at the work-tree—the work-tree is the proposed next commit—and Git isn't because the index is the proposed next commit, and that means Git's job is always going to be easier.


Summary

  • A Git repository has, at all times, three copies of each "active" file:
    • the frozen copy in the current commit;
    • the current staged copy: frozen format, but replaceable with git add;
    • the work-tree copy.
  • Except, that is, in a new, totally-empty repository when there is no current commit: then there are only two copies.
  • The index or staging area holds your proposed next commit.
  • Use git add to copy files from the work-tree into the staging area. Use git rm to remove files from both the work-tree and the staging area.
  • Any file that is in the staging area right now is tracked. Any file that isn't in the staging area right now is untracked.
  • git commit will simply package up whatever is in the staging area right now. Adding -a will make it run git add -u first, and git add -u will only look at tracked files.
  • git checkout can check out any existing commit. To do so, it will copy all the files from that commit into both the index/staging-area, and the work-tree.
  • If you git checkout a branch name, that branch name is now the current branch, and git commit will make the branch name identify the new commit you just made. The commit you had out earlier will be the parent of the new commit.
  • The very first commit ever is special because it allows branch names to exist. Before that point, you can change which branch name you're on, but no branch names exist. The name you're on will be created by that very first commit.

(There's actually a way to get into that special mode even when you do have a bunch of branches, using git checkout --orphan, but let's avoid that topic here as this is already plenty long.)

Upvotes: 1

IamK
IamK

Reputation: 2974

As Daniel A. White said, you have to add files before you commit.

Before you can create a commit you have to tell to git which files you want to include in the commit (Aka move it to the staging area). You can git add . (prepare the content staged for the next commit - it is safer to to use the command git add -u which adds only the files which are already tracked - thanks to knittl to pointing this out) and commit -m "some commit message"or git commit -am "message" (this command automatically stages the files and creates a commit with the given message)

Upvotes: 2

Related Questions