ERJAN
ERJAN

Reputation: 24508

Understanding the purpose of git commit command

I have watched few tutorials and here is my understanding of the "commit" command shown in statements:

Am I correct about these statements?

Upvotes: 1

Views: 7122

Answers (3)

AnoE
AnoE

Reputation: 8355

git basically uses these 3 things to store your data:

  • A blob is just a binary blob of stuff (your source code, an image, whatever). The blob does not contain information about the type or name of the content and is just a mass of bytes.
  • A tree points to one or more blobs, think "directory". It contains file names etc. of the blobs it points to.
  • A commit (the data object, not the git command) is a separate entity that points to exactly one tree (the state of all files at that point in time) and zero or more other commits (the parent(s), which may be missing for the very first commit in a repository, and which may be multiple in case of merges).

That's it. There's nothing else to it, conceptionally. In practice, there are branches and tags, but these are just special "sticky notes" pointing to commits. There are also mechanisms in place to reduce storage and such, but they are not of interest unless you hack on the code or go real deep.

Answering your question is easy in this context:

  • When you checkout a commit into your working directory, you get the files from one specific commit. Say, the sticky note (branch) master points to a commit afd876123, and you clone the repository into a fresh working directory, then you get the files represented in the tree that the commit afd876123 points to.
  • git of course keeps track and creates a special sticky note HEAD which stores the information that you are on master and on commit afd876123. It also creates an index which you can think of as an anonymous tree.
  • You edit some files in your working directory.
  • When you run git add, then git updates the index with your changes. While internally it works differently, you can think of it as updating the tree "index" with your changes.
  • No commit is associated with this tree (yet), so it is not a permanent thing; it does not affect pushes, pulls or whatever.
  • When you run git commit it creates a new commit object which points to the tree that is represented by your current index, as well as the previous commit afd876123 and other information (like timestamp, log message...). This commit object is then added to the data store and thus finalized.

The rest of your assumptions about push/pull are basically correct.

Upvotes: 2

murungu
murungu

Reputation: 2250

1) "since git uses a system of "snapshots" of the entire codebase, git needs to know history of changes and show to all coders who did what at each moment in time."

Yes, that is what version control systems do. They allow you to go back to the state of the code at a previous point in time and recover lost or deleted work. They also allow you to see who did what and when. Go to one of your versioned files and type in git annotate path/to/file and see what happens.

2) "commit is like recording the changes in project's memory."

First of all (without trying to be anal), the changes to your files are not stored in memory, as in RAM, they are stored on your hard-drive's disk sectors, through the file system. Having saved your changes to the file in question you can then think about storing those changes in git. This involves two steps. Firstly stage the changes, then commit them. Staging changes is also known as adding changes to the staging area or staging changes to the index. Think of the staging area or index as a "commit under construction", but not quite yet ready. You can add files to the staging area using git add. You can see which files have been added using git status. You can see the details of the staged changes using git diff --cached. When you are finally satisfied that you have added all the changes that you want to commit to the staging area you can commit your staged changes use git commit. Therefore the commit command completes "the commit under construction". Internally a new commit object is created in the git database and the branch pointer of your current branch is updated to point to this commit. This two phase commit mechanism gives you a line of defence against accidentally committing changes that you do not want to commit. You have to think about everything you add to the staging area before committing. Try using git add -p for very granular control over what you stage and what you don't.

3) "uploading my changed version of the project,i.e. my branch to main online repo(master) is a different thing?"

Yes, it is a different thing. Git is more of a peer-to-peer architecture then a client-server architecture. This allows you to make local commits without sharing them with others. It allows you to take in other people's work as and when you please and allows you to share your work with them when you are truly ready. It is possible in git to track multiple upstream repositories at the same time. That said git has something analogous to a client-server architecture, but not the same. There are two kinds of git repositories. Bare repositories which developer's use to share code with each-other (analogous to the server in a client-server architecture) and non-bare repositories, which developers work on locally on their workstations (analogous to client in client-server architecture). To move code changes from a branch on your (non-bare) repository to a branch on the online (bare) repository which you first cloned, use git push. A bare repository only contains the contents of the .git directory which includes the commit database, but not the versioned files themselves, hence the name "bare". It does not have to be named .git per-se. The convention is to name it something along the lines of my_project.git and serve it over the network. A non-bare repository on the other hand is just like the repository you make your commits on. There is a hidden .git directory containing everything to do with git as well as the files you are directly working on. You cannot push changes into a non-bare repository and you can seriously mess-up someone else's work by doing this.

4) when I upload my local changes to the main version of the project, my commits(recorded in .git file) become known to others.

This means that they are now stored on the common bare repository. Other people will only know about those changes if they choose to fetch those changes, using git fetch. Having fetched those changes they can either merge those changes into their corresponding local branch using git merge or rebase their local changes on top of your changes using git rebase. To accomplish this process in one step they can use git pull. The pull strategy (to merge or to rebase), is determined by the configuration option pull.rebase, configured by the command git config pull.rebase true. I highly recommend rebasing over merging, as this encourages linear history as a merge commit has two ancestor commits, whereas a rebased commit only has one.

5) "uploading changes to master branch is pushing all my commits, right?"

Almost right. The git push command can take arguments as well but in the absence of those arguments it will make sensible default inferences. Git will use something called refspecs and upstream branch configuration to make these inferences. When you push you are moving commits from a branch on your non-bare repository to a branch on a bare repository. If git cannot make these inferences correctly (i.e. which branch on which repository do you want to move changes to, and which local branch you are moving them from) you will have to supply these arguments explicitly to the git push command.

Upvotes: 1

Thibault D.
Thibault D.

Reputation: 10004

  • since git uses a system of "snapshots" of the entire codebase, git needs to know history of changes and show to all coders who did what at each moment in time.

It's a reasonable way to phrase it.

  • "commit" is like recording the changes in project's memory.

Yes, committing is adding a record of your changes (compared to the previous commit) on top of the branch.

  • uploading my changed version of the project,i.e. my branch to main online repo(master) is a different thing?

When you push, the remote server appends your changes to the remote branch provided that your history follows the server history. For example, if anyone has appended any change to the server history before you push, you are both showing divergent versions of the history from a certain point. So you need first to rewrite your local history so that it complies with the server's history. (usually using git pull, which will merge or rebase your branch depending on your choice)

  • when I upload my local changes to the main version of the project, my commits(recorded in .git file) become known to others.

Yes, when you push you let other know how you modified the history.

  • uploading changes to master branch is "pushing" all my commits, right?

Pushing is "uploading" (if you wish to use that word) your changes to the remote's master branch. As I said earlier, the remote will only accept if your changes are built upon the latest history available at the remote's master.

Note that all of this is true for any branch, you can have as many branches as you want, no only master.

Upvotes: 3

Related Questions