What's the meaning of git's snapshot of a file?

Question

Git thinks of its data more like a set of snapshots of a miniature filesystem

I could not understanding the meaning of snapshot of git. Does git store the entire file content in each snapshot/version? For example, version 1

#include 
int main()
{
        printf("hello, world");
        return 0;
}

In version 2 I added an extra line to the file.

#include 
int main()
{
        printf("hello, world");
        printf("hello, git");
        return 0;
}

Will git store the entire content rather than store only the difference(printf("hello, git")) between these two versions as svn etc?

If it is, what's the point?

Nick Volynkin · Accepted Answer

Will git store the entire content rather than store only the difference(printf("hello, git")) between these two versions as svn etc?

Git stores the entire contents of a file. But it takes no extra space when the file didn't change.

Read this brilliant answer about the Git pack file format: Are Git's pack files deltas rather than snapshots?

About SHA1

Files (and other stuff) are stored in a form of a "blob". Each sequence of bytes has its own sha1-code, which is pretty unique for it.

The following is true about SHA1:

SHA1 calculation for a file gives the same result at any time, OS, Git version or implementation.
Files with different names or paths but equal contents will always have equal SHA1-s.
If two files have different SHA1-s, they are not equal with probability of 1.
If two files have equal SHA1-s, they are equal with probability of around 1 - 1 / 2⁴⁰⁰ (as I remember) which is pretty much like 1.

What benefits this system gives

Revisions can be compared for equality very quick. No file contents are checked, just their SHA1-s.
- When you push/pull, only changed files are transmitted.
- Checking status of current changes is done in a moment.
- Lets you track N files with equal contents, taking only place of a single file in Git.
Changing the revision in your working tree is very quick.
- Without applying consecutive patches
- You can exclude commits from a branch, pull them to another branch, change their order.

About diff (and git diff):

You may have noticed that git indeed shows a diff of text files, pointing out the added and removed lines. This is done with the diff utility for your convenience. This also helps collect contribution statistics. And this is used for resolving merge conflicts. But nevertheless Git treats and stores text (and binary) files as single blobs.

Exclusion with git add --patch

There is a way to force Git to break text files to chunks when staging changes. This may be useful for very large files, but pretty useless for small ones.

git add --patch

Interactively choose hunks of patch between the index and the work tree and add them to the index. This gives the user a chance to review the difference before adding modified contents to the index.

These are my favourite illustrations about Git from Pro Git:

enter image description here

What's the meaning of git's snapshot of a file?

Answers (2)

About SHA1

What benefits this system gives

About diff (and git diff):

Exclusion with git add --patch

These are my favourite illustrations about Git from Pro Git:

Related Questions

What&#39;s the meaning of git&#39;s snapshot of a file?

Answers (2)

About SHA1

What benefits this system gives

About diff (and git diff):

Exclusion with git add --patch

These are my favourite illustrations about Git from Pro Git:

Related Questions

What's the meaning of git's snapshot of a file?