Deep Thought
Deep Thought

Reputation: 1249

What's the closest equivalent to `git fetch-pack` that will work over HTTP?

I want to do a fetch from a remote Git repository programmatically. I found experimentally that git fetch-pack, when it works, does exactly what I want:

This works beautifully when testing on remote repos that are on a local filesystem. However, I have found that calling git fetch-pack with an HTTP or HTTPS URL does not work. It says fatal: protocol 'http' is not supported. It appears that fetch-pack is intended for the SSH and git protocols, not for HTTP.

Am I right that fetch-pack isn't meant to work with HTTP? If so, what is the closest alternative?

I have considered:

I think the third is probably the best approach, i.e., the simplest one that will behave well. So, I'm going to try that, but also post this question in case anyone knows in the meantime that (i) it won't work, (ii) there's a simpler way, or (iii) it will work and it's the simplest way, but to look out for x/y/z gotcha.

Upvotes: 3

Views: 134

Answers (1)

Deep Thought
Deep Thought

Reputation: 1249

The suggestion from dan1st worked, but I'll share some additional subtleties.

Also, as noted by Brian61354270, only newer Git versions support --porcelain on fetch.

The full command to get the closest behaviour to fetch-pack --all seems to be:

git -C <local-repo-path> -c advice.fetchShowForcedUpdates=false fetch --dry-run --force --no-tags --porcelain --verbose --no-show-forced-updates <remote-url> refs/*:refs/*

--dry-run prevents the command from actually updating any refs. Empirically, it does still populate local objects, but I haven't seen anything in the documentation to guarantee this, and in principle it wouldn't be expected to, so if your application crucially expects that objects are populated, you may want to add a test to confirm they're there. Empirically, update-ref will return failure if you try to set a ref to an object that isn't present, but this isn't documented, either.

--porcelain not only makes the ref status easier to parse, but it also sends it to stdout instead of stderr. This is quite useful where you don't want the user to see this potentially false ref status, but still want to display the usual progress display (which will still be on stderr).

With --porcelain, it only outputs the local ref that would be updated, not the remote ref, but if you map them identically, that will effectively give you the remote ref. (It doesn't matter if you're not actually mapping things that way, since the command won't write any refs anyway.)

I used --force just to be defensive, to make sure everything will get downloaded in spite of any update rules, although I suspect it would be regardless. + in the refspec should mean the same thing, but --force sounds a little more convincing.

The output you get is actually specifying what would be done for each ref, even if you're only interested in what refs are there. That output may include whether or not the ref is a forced update (non-fast-forward). That is an expensive test (tree walk), so if you're not actually using that information (or get it elsewhere), you should pass --no-show-forced-updates. Doing so may print a warning to stderr unless you also use the advice.fetchShowForcedUpdates=false.

--verbose is to include up-to-date objects, and --no-tags prevents tags from being listed twice, because the command otherwise fetches tags implicitly so may list them a second time if your refspec includes them. (Yes, it did actually do that, silly as it sounds.)

Pay careful attention to the flag field in the output when using --verbose, as the flag for an up-to-date reference is just a single space, but space is also the field separator! This can bite you if you've got anything trimming the output before you parse it, or if you try to just split the whole line on spaces.

Apart from superficial differences in the stdout formatting, functional differences seem to be:

  • fetch should work on all protocols/transports, whereas it seems fetch-pack doesn't work on HTTP/S.
  • fetch-pack wants a URL; you can give fetch a URL or named remote.
  • fetch doesn't output a line for HEAD, but if needed you can check it via ls-remote --symref.

As with fetch-pack, fetch won't differentiate between symbolic refs and direct refs. ls-remote --symref can obtain symbolic refs; docs indicate that when going through upload-pack on the remote, this only works for HEAD.

When doing updates, typically you'll want to send a batched transaction to git update-ref --stdin.

fetch is higher level than fetch-pack or git_remote_download, so, although as best as I can tell fetch does obtain everything when given --force and refs/*:refs/*, in principle it could be somewhat more likely to do funny stuff, now or in the future. (Just as an example, it has special rules for tags. The risk is minimal, but still higher than with low-level functions.)

Since fetch --dry-run seemed to work fine, I didn't even try the libgit2 approach, so I can't say how that would've worked. Had I been doing this in a language where calling into a C library is more straightforward, I may've tried that first.

Listing refs separately

Another possible option I had considered, and which Jim Redmond also mentioned, is to list the refs separately via ls-remote.

This will work fine most of the time (and with the added benefit that you can see at least the HEAD symref), but if there are concurrent changes upstream, then it's possible due to a race condition that you try to set a ref to an object that you don't have.

One possible way around that is to use --stdin with fetch and give it a refspec for every object, in the form 232ef0134b4807085b190a3b9b01bee3eb6dfab8:refs/anything. That seems to work and should make sure you actually get all the objects you're expecting. But just using --porcelain seems slightly easier.

Just let Git do it

In most use cases, you don't need to go to this trouble, and can just ask Git to fetch, and be done with it.

I chose to bypass Git's high-level reference updating logic for two reasons:

  1. My application applies some custom rules that involve interdependencies and can't all be expressed in refspecs, at least not without degenerating to one-by-one treatment.

  2. Given the nature of this particular application, I wanted an absolute guarantee that it will always do exactly what I intended, or else panic without changing anything. There's one way to always know for sure what a program is going to do.

Upvotes: 1

Related Questions