Matt
Matt

Reputation: 2070

Permanently removing binary files from GitLab repos

We have a GitLab-hosted repo at work that contains some large binary files that we'd like to remove. I know of tools such as BFG Repo-Cleaner which will remove a file from a Git repository.

We often refer to specific commit IDs in GitLab. Would running BFG Repo-Cleaner mess these up?

If so, is there a better way to clean a repo that wouldn't mess these up?

Upvotes: 0

Views: 1293

Answers (2)

Roberto Tyley
Roberto Tyley

Reputation: 25314

We often refer to specific commit IDs in GitLab.

Although git history can't be modified without changing all subsequent commit ids, the BFG does a few things that will help with the change:

  1. As it's cleaning your repo, the BFG also updates any object ids it finds in commit messages with their new ids. If you are deleting private data, it's a straight substitution, if you're just deleting big files (ie the commit ids themselves don't imply sensitive information), the text in your commit message becomes "$newId [formerly $oldId]" and in addition, a Former-commit-id: footer will be added to the bottom of all modified commit messages.
  2. The BFG also creates a object-id-map.old-new.txt file under the repo-name.bfg-report directory every time it runs. In principle, I believe this file could be used on a GitLab repo so that other references to commit ids could be fixed too.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Upvotes: 1

larsks
larsks

Reputation: 312048

We often refer to specific commit IDs in GitLab. Would running BFG Repo-Cleaner mess these up?

A git commit id is built from the hash of the commit contents and the id of the previous commit. This means that any operation that modifies your history will result in a (a) a new commit id for whatever commit you modify and (b) a new commit id for every descendant commit.

There is no way to modify the history of your repository without generation a new sequence of commit ids.

Upvotes: 1

Related Questions