Sharandeep
Sharandeep

Reputation: 25

Git repository confusion

Yesterday I started learning about git. But I got baffled seeing two contradictory repository definitions.

1st: A repository is a directory which contains your project. A repository is made up of commits.

2nd: A repository is the .git folder inside your project.

Do the two statements actually convey the same thing? Then how do they?
I've seen the .git hidden folder which is certainly not my project.

Upvotes: 0

Views: 116

Answers (2)

torek
torek

Reputation: 489848

As Wim Coenen put it, the second definition—that the repository is the stuff inside the .git directory—focuses on the organization.

But, formally speaking, I have to agree with the second definition. The remaining area—the area where you do your work—is not part of the repository itself. It is merely next to the repository.

The reason for this is that the stuff inside the .git folder is Git's. You can look at it, and if you understand Git's internals—which change from one Git release to another, as Git evolves over time—you can even edit things directly here. But in general, you should leave this stuff to Git itself.

The files that are not in the .git folder are yours. You can do whatever you like with them. Git will fill in your work area, from a commit, when you ask it to.

The short version, then, is that you work in your work-tree. This area is yours, to do whatever you like. You then tell Git, at various points in time: do something. That something can:

  • copy files from your work-tree into Git's repository; or
  • copy files from Git's repository to your work-tree; or
  • do one of many other things, such as compare particular commits, view past commits, call up another Git repository and exchange commits with it, and so on.

This distinction—between your work area, which is not part of the repository proper, and Git's area, which actually holds the repository—becomes even more important if you use the git worktree command, first added in Git 2.5. In particular, you can use git worktree add to create additional work-trees. Each such work-tree is not in the repository, and in fact, you can simply remove such a work-tree when you are done with it.

(Git calls your work area a working tree or work-tree. This is why the command that adds a new work-tree is git worktree add.)

The main theme with Git itself is that Git stores commits. Each commit in turn stores files. In fact, each commit holds a full snapshot of all files. Git's stored files use de-duplication, since most commits mostly hold the same versions of files as some other commit. They're also stored in a special, read-only, Git-only format. Only Git can actually read these files. That's why Git extracts the files to your work-tree.

The part that is particularly odd is that when Git makes new commits—which is how you have Git store the updated files, after you've updated them—it makes them from copies that aren't the copies in your work-tree! If you have ever used Mercurial, which is otherwise a lot like Git, this can be kind of baffling. In Mercurial, hg commit makes a new commit from the files in your work-tree. This is simple and clear. But git commit makes the new commit from files that are in Git's index, instead of the files in your work-tree. You must keep using git add to copy any files you have updated, back into Git's index.

Hence, Git's index—which Git also calls the staging area—is what holds your proposed next commit. In Mercurial, which is easy to use, your work-tree holds your proposed next commit. In Git, the proposed next commit starts out matching the current commit. As you change files in your work-tree, you must copy the changed files back into Git's index, to change the proposed next commit.

(Git's method of making new commits gives you flexibility that is harder to achieve in Mercurial, at the cost of requiring a lot of git add commands.)

Note: in modern Git, it is possible to separate Git's repository—the .git folder—from your work-tree, using git init --separate-git-dir. I don't know of anyone who uses this in ordinary everyday work, though.

Upvotes: 1

Wim Coenen
Wim Coenen

Reputation: 66783

Both definitions focus a bit too much on what a repository looks like on your local filesystem.

Conceptually, a repository is version controlled file tree. It contains snapshots (or "commits") of different points in time and different development branches of the same project.

When a repository is cloned locally, everything is contained in one folder. The data needed to reconstruct all the different snapshots resides in the .git subfolder. The rest of the folder represents a certain snapshot of the project, plus any uncommitted changes that you are currently making to it. At any moment, you can decide to create a new snapshot by doing a "commit". Users can share snapshots by pushing/pulling them to/from remote repositories.

The snapshots are linked together, so if you get one then you also recursively get all the other ones that it was based on. This allows you to examine the entire history of the project leading up to that state.

Upvotes: 2

Related Questions