Reputation: 1249
I want to do a fetch from a remote Git repository programmatically. I found experimentally that git fetch-pack
, when it works, does exactly what I want:
Automatically downloads all new objects into the local repository's object store.
Displays status updates on the terminal (stderr
) as to the progress of what it's doing, but does not print anything that git fetch
would about the status of refs, such as * [new ref]
.
Does not change any local refs, instead just dumping the remote's refs onto stdout
, allowing my program to itself update refs according to custom logic.
This works beautifully when testing on remote repos that are on a local filesystem. However, I have found that calling git fetch-pack
with an HTTP or HTTPS URL does not work. It says fatal: protocol 'http' is not supported
. It appears that fetch-pack
is intended for the SSH and git
protocols, not for HTTP.
Am I right that fetch-pack
isn't meant to work with HTTP? If so, what is the closest alternative?
I have considered:
git fetch
into a temporary ref namespace. This is extremely ugly. It misleadingly prints * [new ref]
for every ref, and, while it's possible to use switches to shut this up, that also squelches wanted progress displays. It creates spurious refs that will then have to be deleted. Worst of all, it may behave poorly if remote refs contain characters that are legal to Git but illegal on the local filesystem, or if any remote refs are differentiated only by letter case but the local filesystem folds case. I don't want refs to initially hit the local filesystem (until I call git update-ref
myself).
Implement the smart HTTP client protocol myself. The protocol is documented, but this would be a fair bit of work, most of which would reinvent the wheel.
Use libgit2. I suspect that git_remote_download
would do what I want. Unfortunately, this library's documentation is generally quite terse (in contrast to, say, WinAPI or Direct3D). And, the examples only cover the higher-level convenience function, git_remote_fetch
, which would do what I don't want. Just as the git-fetch-pack
documentation gave no warning that it wouldn't work on HTTP, I could be barking up the wrong tree here, too.
I think the third is probably the best approach, i.e., the simplest one that will behave well. So, I'm going to try that, but also post this question in case anyone knows in the meantime that (i) it won't work, (ii) there's a simpler way, or (iii) it will work and it's the simplest way, but to look out for x/y/z gotcha.
Upvotes: 3
Views: 134
Reputation: 1249
The suggestion from dan1st worked, but I'll share some additional subtleties.
Also, as noted by Brian61354270, only newer Git versions support --porcelain
on fetch
.
The full command to get the closest behaviour to fetch-pack --all
seems to be:
git -C <local-repo-path> -c advice.fetchShowForcedUpdates=false fetch --dry-run --force --no-tags --porcelain --verbose --no-show-forced-updates <remote-url> refs/*:refs/*
--dry-run
prevents the command from actually updating any refs. Empirically, it does still populate local objects, but I haven't seen anything in the documentation to guarantee this, and in principle it wouldn't be expected to, so if your application crucially expects that objects are populated, you may want to add a test to confirm they're there. Empirically, update-ref
will return failure if you try to set a ref to an object that isn't present, but this isn't documented, either.
--porcelain
not only makes the ref status easier to parse, but it also sends it to stdout
instead of stderr
. This is quite useful where you don't want the user to see this potentially false ref status, but still want to display the usual progress display (which will still be on stderr
).
With --porcelain
, it only outputs the local ref that would be updated, not the remote ref, but if you map them identically, that will effectively give you the remote ref. (It doesn't matter if you're not actually mapping things that way, since the command won't write any refs anyway.)
I used --force
just to be defensive, to make sure everything will get downloaded in spite of any update rules, although I suspect it would be regardless. +
in the refspec should mean the same thing, but --force
sounds a little more convincing.
The output you get is actually specifying what would be done for each ref, even if you're only interested in what refs are there. That output may include whether or not the ref is a forced update (non-fast-forward). That is an expensive test (tree walk), so if you're not actually using that information (or get it elsewhere), you should pass --no-show-forced-updates
. Doing so may print a warning to stderr
unless you also use the advice.fetchShowForcedUpdates=false
.
--verbose
is to include up-to-date objects, and --no-tags
prevents tags from being listed twice, because the command otherwise fetches tags implicitly so may list them a second time if your refspec includes them. (Yes, it did actually do that, silly as it sounds.)
Pay careful attention to the flag field in the output when using --verbose
, as the flag for an up-to-date reference is just a single space, but space is also the field separator! This can bite you if you've got anything trimming the output before you parse it, or if you try to just split the whole line on spaces.
Apart from superficial differences in the stdout
formatting, functional differences seem to be:
fetch
should work on all protocols/transports, whereas it seems fetch-pack
doesn't work on HTTP/S.fetch-pack
wants a URL; you can give fetch
a URL or named remote.fetch
doesn't output a line for HEAD
, but if needed you can check it via ls-remote --symref
.As with fetch-pack
, fetch
won't differentiate between symbolic refs and direct refs. ls-remote --symref
can obtain symbolic refs; docs indicate that when going through upload-pack
on the remote, this only works for HEAD
.
When doing updates, typically you'll want to send a batched transaction to git update-ref --stdin
.
fetch
is higher level than fetch-pack
or git_remote_download
, so, although as best as I can tell fetch
does obtain everything when given --force
and refs/*:refs/*
, in principle it could be somewhat more likely to do funny stuff, now or in the future. (Just as an example, it has special rules for tags. The risk is minimal, but still higher than with low-level functions.)
Since fetch --dry-run
seemed to work fine, I didn't even try the libgit2 approach, so I can't say how that would've worked. Had I been doing this in a language where calling into a C library is more straightforward, I may've tried that first.
Another possible option I had considered, and which Jim Redmond also mentioned, is to list the refs separately via ls-remote
.
This will work fine most of the time (and with the added benefit that you can see at least the HEAD
symref), but if there are concurrent changes upstream, then it's possible due to a race condition that you try to set a ref to an object that you don't have.
One possible way around that is to use --stdin
with fetch
and give it a refspec for every object, in the form 232ef0134b4807085b190a3b9b01bee3eb6dfab8:refs/anything
. That seems to work and should make sure you actually get all the objects you're expecting. But just using --porcelain
seems slightly easier.
In most use cases, you don't need to go to this trouble, and can just ask Git to fetch, and be done with it.
I chose to bypass Git's high-level reference updating logic for two reasons:
My application applies some custom rules that involve interdependencies and can't all be expressed in refspecs, at least not without degenerating to one-by-one treatment.
Given the nature of this particular application, I wanted an absolute guarantee that it will always do exactly what I intended, or else panic without changing anything. There's one way to always know for sure what a program is going to do.
Upvotes: 1