I'm trying to checkout a particular file from particular commits in remote. Please note that the commit is not in local repo and is only part of remote repo. I do not want to download the raw file from GitHub/bitbucket interface. Because my remote is not on similar platforms. I do not want to do git fetch followed by git checkout because doing git fetch will download a bunch of other items which I don't want. I'm only interested in that particular file from the particular commit.

git

itthrill

Reputation: 1376

Is it possible to download a single file from specified commits from remote?

I'm trying to checkout a particular file from particular commits in remote.

Please note that the commit is not in local repo and is only part of remote repo.

I do not want to download the raw file from GitHub/bitbucket interface. Because my remote is not on similar platforms.
I do not want to do git fetch followed by git checkout because doing git fetch will download a bunch of other items which I don't want. I'm only interested in that particular file from the particular commit.

Upvotes: 4

Answers (5)

jthill

Reputation: 60295

edit: from comments:

I need to check same file from hundreds of commits spread over dozen of branches.

For this, you're going to need cooperation from the other repo's admin.

In Git, history is published by giving it a refname (branch, tag, whatever) and some sort of access via shared filesystem or hosted server.

The stuff that's not worth giving its own refname is either part of published history (that does have its own refname) or it's not.

If it is, Git will ensure you get a complete, internally-consistent pack that brings you up to date with the published history you asked for. Git's laser-focused on making that specific operation as fast and efficient as possible.

If it's not, then the hosting repo hasn't published it and (a) you ordinarily can't get it at all, and (b) you ordinarily don't even know how to ask for it, its object id.

To find an object's id, you have to hunt through history examining snapshots, ... which means you have to have the snapshots ... see?

Git doesn't like paying overhead costs twice, and it's built to be a vcs. You're trying to use it like a shared filesystem. Filesystems are built to be efficient at serving single objects frequently and repeatedly to the same client. dvcs's are built to be efficient at serving multiple complete revisions, at relatively quite long intervals, once per client. This is engineering-tradeoff territory: you can't be superbly efficient at both, and the better you get at one or the other, the harder it is to re-tool and do the other thing.

All that said: if you can get the other repo admin to do some custom work for you, this won't be hard:

git rev-list --branches --objects -- path/to/file | git pack-objects pack

will pack up the history of all branches' versions of that file: the commits that introduce new versions, the trees that show where they go, and their contents, and put it in two files named pack-<hashcode>.{idx,pack}. Put that pack in any repo's objects/pack directory and there you are: you've got everything you need to deal with just that file.

Such a sliced-up history is relatively difficult to work with, and the overhead of filling in the missing bits on demand is precisely what Git's built to avoid, but work with exactly what you've got, you can use e.g. git verify-pack -v to show you the exact contents of a pack and git cat-file -p to print individual objects. The commits in that pack are the ones that introduce new versions, you refer to your file in one of those by appending :path/to/file to its commit id.

So, when you run the verify-pack to see what you've got, you'll get a dump of waaayyyy too much information about its content and structure. To make it useful for your purposes here, you can scrape just the commit ids out, and list those by date order, with

# this is the pack I made for testing 
git verify-pack -v .git/objects/pack/pack-8d3bb7bca6a4cdc086778ad55c79f45e672ae7e5.idx \
| awk '$2=="commit"{print $1}' \
| git rev-list --stdin --date-order --no-walk

sub in log for rev-list to see the log messages, or you can show the blob you fetched with e.g. git show <commit-hash>:path/to/file. To show the blobs in time sequence you can

git     git verify-pack -v .git/objects/pack/pack-8d3bb7bca6a4cdc086778ad55c79f45e672ae7e5.idx \
| awk '$2=="commit"{print $1}' \
| git rev-list --stdin --date-order --no-walk --pretty=%h:path/to/file \
| git cat-file --batch

which will dump the content in scannable form.

. . . actually, if an all-in-one dump of the history will do ya, and you just need the content and sequence to match, not so much the resulting commit id's, Git's fast-export might do the job all in one for you, have the admin do

git fast-export --branches -- path/to/file | zstd >my-stuff.zst

which might even be more compact than the pack files (since it doesn't have to preserve id's) and ship that to you.

Upvotes: 2

ElpieKay

Reputation: 30868

It's possible but may not work if the server side configuration is not under your control.

Git has a built-in command which can retrieve a file in a specific commit.

git archive --remote=<url_to_the_repo> <commit> --format=tar <path> | tar xvf -

Some hosting services don't allow git archive --remote. If so, this command cannot work at all.

Some hosting services disable fetching unadvertised objects. If so, for the <commit>, only a valid ref instead of a sha1 value is allowed. With a ref, one can only get the file of the tip commit. For a commit not referenced to by any ref, it's not possible to retrieve it this way.

Another possible but unpractical method is to create tags for all blob objects of your interested files, and then fetch the tags to retrieve their contents.

One of the more practical methods is to fetch all the necessary data, and use git show <commit>:<path> to read the content. It takes time and disk space, but it's very reliable. And avoid git checkout if possible, to save a bit time and space.

Upvotes: 2

aalbagarcia

Reputation: 1044

Unfortunately, you cannot do what you describe in your question and your comments.

git doesn't work at the file level like subversion or other source code management systems. git works at the snapshot level. Every commit is like a snapshot of your code (this is a very simple model of how git works, it's more complicated under the hood). Therefore, the only way you have to get the files that you want is

first get the snapshots from the server to you local machine (git-fetch)
second, once you have the snapshots, you can extract files from the snapshots (git-checkout).

The answer from @hobbs to this same question shows you how to do it.

Upvotes: 1

hobbs

Reputation: 239980

I do not want to do git fetch followed by git checkout because doing git fetch will download a bunch of other items which I don't want.

You need to do git fetch. That is the way you get the remote server to send you stuff. You can, however, minimize the amount of "extra stuff" that it sends you, using something like

git fetch --force --depth 1 origin $COMMIT_SHA:tmp

which will fetch just the commit $COMMIT_SHA (and all of the files needed to complete it — AFAIK you can't avoid that) from the remote origin, and name it tmp. The --force will prevent failure if a branch named tmp already exists (good for repeated use, but use with care, of course).

Then you can git cat-file blob tmp:somepath or git checkout tmp -- somepath or whatever you want to access the file contents.

If you git branch -D tmp ; git gc when you're done, there should be virtually no accumulated cruft.

Upvotes: 2

Tim Biegeleisen

Reputation: 521409

If you happen to know a particular remote branch which contains this commit, you may fetch this branch alone:

git fetch origin some_branch

Then, checkout the file at the exact commit you want:

git checkout abc123 -- path/to/your/file.ext

Upvotes: 1

Is it possible to download a single file from specified commits from remote?

Answers (5)

Related Questions