Jichao
Jichao

Reputation: 41805

What's the meaning of git's snapshot of a file?

I'm reading git basics

Git thinks of its data more like a set of snapshots of a miniature filesystem

I could not understanding the meaning of snapshot of git. Does git store the entire file content in each snapshot/version? For example, version 1

#include <stdio.h>
int main()
{
        printf("hello, world");
        return 0;
}

In version 2 I added an extra line to the file.

#include <stdio.h>
int main()
{
        printf("hello, world");
        printf("hello, git");
        return 0;
}

Will git store the entire content rather than store only the difference(printf("hello, git")) between these two versions as svn etc?

If it is, what's the point?

Upvotes: 2

Views: 619

Answers (2)

jthill
jthill

Reputation: 60295

Will git store the entire content rather than store only the difference? [... and if so] what's the point?

Yes. That's what makes constructing good git histories so much simpler, and counterintuitively enough it also results in better compression efficiency.

(edit: relegate lotsa pedantry and elaboration to the revision history)

Upvotes: 1

Nick Volynkin
Nick Volynkin

Reputation: 15109

Will git store the entire content rather than store only the difference(printf("hello, git")) between these two versions as svn etc?

Git stores the entire contents of a file. But it takes no extra space when the file didn't change.

Read this brilliant answer about the Git pack file format: Are Git's pack files deltas rather than snapshots?

About SHA1

Files (and other stuff) are stored in a form of a "blob". Each sequence of bytes has its own sha1-code, which is pretty unique for it.

The following is true about SHA1:

  1. SHA1 calculation for a file gives the same result at any time, OS, Git version or implementation.
  2. Files with different names or paths but equal contents will always have equal SHA1-s.
  3. If two files have different SHA1-s, they are not equal with probability of 1.
  4. If two files have equal SHA1-s, they are equal with probability of around 1 - 1 / 2400 (as I remember) which is pretty much like 1.

What benefits this system gives

  1. Revisions can be compared for equality very quick. No file contents are checked, just their SHA1-s.
    • When you push/pull, only changed files are transmitted.
    • Checking status of current changes is done in a moment.
    • Lets you track N files with equal contents, taking only place of a single file in Git.
  2. Changing the revision in your working tree is very quick.
    • Without applying consecutive patches
    • You can exclude commits from a branch, pull them to another branch, change their order.

About diff (and git diff):

You may have noticed that git indeed shows a diff of text files, pointing out the added and removed lines. This is done with the diff utility for your convenience. This also helps collect contribution statistics. And this is used for resolving merge conflicts. But nevertheless Git treats and stores text (and binary) files as single blobs.

Exclusion with git add --patch

There is a way to force Git to break text files to chunks when staging changes. This may be useful for very large files, but pretty useless for small ones.

git add --patch 

Interactively choose hunks of patch between the index and the work tree and add them to the index. This gives the user a chance to review the difference before adding modified contents to the index.

These are my favourite illustrations about Git from Pro Git:

enter image description here enter image description here

Upvotes: 1

Related Questions