What is a blob under the hood?

Question

I read on the official git website that:

The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes. These systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they keep as a set of files and the changes made to each file over time, (...)

Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini filesystem. Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. (...)

So I was wondering: if snapshots and not changes are saved, does it mean that if I change but one character in a 10Ko file, a second 10Ko file (or blob) will be created in my repository?

What is a blob under the hood? The file itself? Is it compressed? Is any small change in my file growing the repository drastically?

As I know you guys, I'll answer your comments before they come: I understand that disk space is not a problem anymore and that I don't have to worry about copying 10Ko, my question is just to satisfy my curiosity.

EDIT

Ok, Git's blob data and diff information gives half of the information. But is it compressed and/or space-optimized in any way?

bperson · Accepted Answer

(Quick and noobish answer)

It gets compressed when packing your repo. From what I know he will sometimes inverse the diff so that the plain text version stored is the latest one. And the diffs are with the older ones. This makes accessing the latest changes quicker.

What is a blob under the hood?

EDIT

Answers (2)

Related Questions