Reputation: 167
I'm looking for a way to give users access to data of a specific version of a repo through raw.githubusercontent.com, without massively increasing the size of the already very large repo.
Context: I have a GitHub repo that contains a large amount of data created by a text reuse algorithm. The same algorithm is run twice a year on a corpus of texts that undergoes change, so the data in this repo changes twice a year.
In order to allow access to a specific version of the text reuse data, I was thinking of using tags; in my understanding, a tag is simply a pointer to a specific commit, so it should not significantly increase the size of the repo.
However, this does not seem to be the case according to an experiment I ran (see below): when I push a tag to GitHub, it also creates a zip file that contains the full data at the time of the commit. It even does this when I have not pushed that commit to GitHub yet, and even if I use the lightweight tag format. It seems that this will increase the size of my GitHub repo enormously.
Is there a way to add tags without the creation of the .zip file?
If not, I was thinking of creating a new branch each time the algorithm is run; the branch would not be updated but only serve as a pointer to that specific commit. Would this increase the size of the repo? Are there other downsides of using a branch for this?
Setting up the experiment:
Step 1: annotated tag
git tag -a v1.0 -m "First version with dummy data"
git push origin main
git push origin v1.0
Step 2: annotated tag, push tag first
git tag -a v2.0 -m "Second version with new dummy data"
git push origin v2.0
git push origin main
Step 3: lightweight tags
git tag v3.0
git push origin v3.0
Upvotes: 2
Views: 1242
Reputation: 76884
GitHub provides the ability to download a tarball or zip file, which is automatically generated from the snapshot of a repository at a given state. These links are automatically provided in the web interface, and these archives are created on demand: if nobody requests them, then they'll never be produced.
When you tag a commit and push the tag but not the branch on which the commit was made, the commit is still pushed. If you didn't do that, the tag would be dangling and point to a nonexistent commit, and the repository would be corrupt. Thus, in each case, you're always pushing the commits that are part of the tags.
You can create those archives on your own machine with git archive
. For example, if you want to create an archive of v1, you might do git archive --format=zip -o v1.zip --prefix=v1/ v1.0
.
Upvotes: 3