everett1992
everett1992

Reputation: 2671

How can I fetch a single file at at a specific commit (by hash) from a remote git repository?

I'd like to fetch a file at a commit from a remote git repository without fetching all objects in the repository. I know git archive doesn't work as it can only fetch the tip of a branch.

With sparse-checkout and using protocol v2 (thanks @bk2204) I can create a work-tree with only the readme at a commit, but git transmits 10s of thousands of objects and 188mb.

mkdir linux
cd linux
git init
git config core.sparseCheckout true
git config protocol.version 2
git remote add origin [email protected]:torvalds/linux.git
echo "/README" > .git/info/sparse-checkout
git fetch --depth 1 origin ab02b61f24c76b1659086fcc8b00cbeeb6e95ac7
git checkout ab02b61f24c76b1659086fcc8b00cbeeb6e95ac7
remote: Enumerating objects: 71432, done.
remote: Counting objects: 100% (71432/71432), done.
remote: Compressing objects: 100% (66651/66651), done.
remote: Total 71432 (delta 5277), reused 25451 (delta 3920), pack-reused 0
Receiving objects: 100% (71432/71432), 188.85 MiB | 7.71 MiB/s, done.
Resolving deltas: 100% (5277/5277), done.

Ideally this operation should fetch 3 objects - the commit (the known sha) > the commit's tree > the file in the tree

$ git cat-file -p ab02b61f24c76b1659086fcc8b00cbeeb6e95ac7 | grep tree
tree f6760b0bf32bd3b9a760d6e895c7fb76cd9c2ef8
$ git cat-file -p f6760b0bf32bd3b9a760d6e895c7fb76cd9c2ef8 | grep README
100644 blob 669ac7c32292798644b21dbb5a0dc657125f444d    README
$ git cat-file -p 669ac7c32292798644b21dbb5a0dc657125f444d

Upvotes: 1

Views: 534

Answers (2)

ElpieKay
ElpieKay

Reputation: 30938

A hosting service is supposed to have APIs to download a file of a revision or retrieve its content. For example, Gitlab has GET /projects/:id/repository/files/:file_path/raw, and Github has GET /repos/:owner/:repo/contents/:path, and Gerrit has GET /projects/{project-name}/commits/{commit-id}/files/{file-id}/content.

For a simple self-hosting repository, to fetch a random commit or object by git fetch, you could set uploadpack.allowReachableSHA1InWant=true or uploadpack.allowAnySHA1InWant=true. In most cases, they are false(by default) for safety and performance. For a self-hosting Gerrit, it has similar configuration options. I have no idea about a self-hosting Gitlab.

Upvotes: 1

bk2204
bk2204

Reputation: 76884

In general, you cannot fetch individual objects without partial clone support. The protocol doesn't allow it. Sparse checkout doesn't prevent you from fetching all of the data, it just prevents you from checking it all out.

I'm not aware of any major Git hosting providers that have generally available partial clone support right now, although I suspect it will be coming soon. The feature is still relatively experimental.

However, if you're using a remote that supports protocol v2, you can fetch a specific commit, even if you normally wouldn't be able to without protocol v2. You can run git config protocol.version 2 and then you'll be able to fetch individual commits by hash. Doing that with a --depth 1 would be the best you could do in this particular case.

Upvotes: 2

Related Questions