Thibaut G
Thibaut G

Reputation: 41

Merge gitlab repository with large file into github repository

We work on open source project with two repositories, one private on Gitlab and one public on Github. The team work every day on the private repo to provide new feature and bug fixes to our customer through saas offer. Every six months, changes from gitlab repo are merged into Github public repo.

I couldn't merge it this time. A large file (500mb) has been committed few months ago and it can be pushed to github because the maximum file size allowed is 100mb.

The file is now smaller (there was no need for such large file), but large one remains in the history. So i can’t migrate to Git LFS without losing the upstream.

It seems like I have no other choice but to rewrite history on gitlab ? How could I do it “safely” ? What will happen with other active branchs ?

After that, will I have no other choice but to reset git hard github repository to gitlab repository ?

Upvotes: 1

Views: 266

Answers (1)

TTT
TTT

Reputation: 29004

It seems like I have no other choice but to rewrite history on gitlab ?

Since you said the 500 MB file wasn't needed and was perhaps committed in error, then I would lean towards rewriting. The rule of thumb is it avoid re-writing shared branches, and obviously the private GitLab repo is "shared", but it may be "less-shared" in comparison to the public GitHub repo. So if you have to rewrite something, doing it on the GitLab side is probably less painful. Fortunately, you won't need to also do this:

will I have no other choice but to reset git hard github repository to gitlab repository ?

You won't need to rewrite anything on GitHub because you are only going to rewrite commits in GitLab that haven't been merged to GitHub yet.

I'd recommend using git-filter-repo to rewrite your private GitLab repo. You'll want to read up on the options, but once you have it set up, you'll make a fresh clone of the GitLab repo and rewrite it without the large file. git-filter-repo will only rewrite the minimum number of commits necessary to remove that file, meaning none of the commits already in GitHub will be re-written. Only the commit that added the large file, and every commit on the branch after that commit will be re-written. There is also an option to do this for every branch in the repo that came after the re-written commit, and if you choose to do that, you could have your team pause work for 10 minutes (or a night), rewrite the repo, force push it out, and tell everyone on your team to delete their local branches and re-check them out. (Or they could make a new clone for sanity purposes.) For developers with in-flight changes that haven't been pushed yet, they will probably want to do an advanced rebase --onto to move their commits to the re-written branch. Or, you could ask everyone to push their in-flight branches to GitLab before you re-write.

Side Note: one downside of a rewrite is the GitLab history (such as Merge Requests) that reference commits in the re-written portion of the repo will be pointing to commits that no longer exist. There isn't much you can do about this, but it's important to realize it will happen.

Tip: I would consider enabling a setting in GitLab to limit the max size you can push. This will prevent you from accidentally pushing files you won't be able to bring over to GitHub later.

Upvotes: 1

Related Questions