Asad Moosvi
Asad Moosvi

Reputation: 490

Why does Git store the file in the repository when I just add it to the staging area?

From what I understand when you git add a file, it merely stages the file before adding it to the repository but why can I see it added into the git repository before I've committed it?

For instance, if I create a new git repository and create a new file called foo and add the contents "hello world" into it, and then git add foo, I see a new item in the objects subdirectory inside the .git folder. I can even view the contents of the new file inside the objects file with the git cat-file -p command.

What exactly has been added to the .git/objects folder? What does staging a file technically do? Like what are the steps that take place after a git add is run on a file? Maybe I'll understand it better if I know the steps.

Upvotes: 4

Views: 407

Answers (2)

VonC
VonC

Reputation: 1323333

I can even view the contents of the new file inside the objects file with the git cat-file -p command.

Or git diff --cached.

What exactly has been added to the .git/objects folder?

See "Git Internals - Git Objects"

You can see a file in the objects directory. This is how Git stores the content initially – as a single file per piece of content, named with the SHA-1 checksum of the content and its header. The subdirectory is named with the first 2 characters of the SHA-1, and the filename is the remaining 38 characters.

why can I see it added into the git repository before I've committed it?

Don't forget that with git add -p (--patch) you can add portions (hunks) of a file before a commit.

The goal of the index remains (in the context of git add) to prepare the next commit.
It reflects the original need for Git, as created by Linus Torvalds in 2005, which is to integrate patches (it was just a merge manager at first)

what are the steps that take place after a git add is run on a file

While the index does not include any file content, the .objects does, with loose (non-packed) object:

See Object Storage for the steps

  • Git constructs a header that starts with the type of the object, in this case a blob. Then, it adds a space followed by the size of the content and finally a null byte
  • Git concatenates the header and the original content and then calculates the SHA-1 checksum of that new content.
  • Git compresses the new content with zlib,
  • Finally, Git write the zlib-deflated content to an object on disk. Git determine the path of the object you want to write out (the first two characters of the SHA-1 value being the subdirectory name, and the last 38 characters being the filename within that directory)

Upvotes: 3

Jörg W Mittag
Jörg W Mittag

Reputation: 369428

The staging area is part of the repository.

I think you are confusing the repository's object database with the history. Only commits are part of the history, but all objects Git handles are part of the object database.

Think about it: Git doesn't stay resident in memory, so where else would it record what is part of the staging area than in its object database?

Upvotes: 5

Related Questions