Reputation: 61
Summary
When I create a backup of GitLab. I always have different checksums.
Steps to reproduce
sudo gitlab-rake gitlab:backup:create STRATEGY=copy
What is the current bug
I created backup-script, its backup GitLab hourly and send archive in cloud storage. In two archives with identical contents, there is always a different check-sum. Why do I have two folders with the same files and the same checksums, but when this folders is archived, I get different checksum? Content has not changed, but checksum has always changed. Why?
What is the expected correct behavior?
When the archive is not edited, it must have the same checksum
Relevant logs and/or screenshots
94e779cbe595eda6f79f15437d6059ec50c40de9efe01c7c8227b2c799556aac artifacts.tar.gz (first)
a15da160a4bc6d308f47bd0ebbbeaa09c549f07136d6f13203f05cf0374c77d2 569.log
709b40d737572628d282d5c5f97a62ea4681560d3300f5c126d34436a375618d 570.log
caf0c823c22213c63a86299c4100aec8e8913d3ef6209d36e893982d6fdf3510 571.log
dc77e18335dde4e2ba3ac38d4b2c8b9f59785057e871cceaea172596d3932a0c 572.log
709b40d737572628d282d5c5f97a62ea4681560d3300f5c126d34436a375618d 573.log
14ec475a0cbfc50408a010e14c7f5ab91ae4f675046b53b0e4a65d5dec7e2b79 574.log
67fbe4206bc4b2e5298472e155b81643fb8a30ab41b3c7971e2c9c9c0af1d9a7 artifacts.tar.gz (second)
a15da160a4bc6d308f47bd0ebbbeaa09c549f07136d6f13203f05cf0374c77d2 569.log
709b40d737572628d282d5c5f97a62ea4681560d3300f5c126d34436a375618d 570.log
caf0c823c22213c63a86299c4100aec8e8913d3ef6209d36e893982d6fdf3510 571.log
dc77e18335dde4e2ba3ac38d4b2c8b9f59785057e871cceaea172596d3932a0c 572.log
709b40d737572628d282d5c5f97a62ea4681560d3300f5c126d34436a375618d 573.log
14ec475a0cbfc50408a010e14c7f5ab91ae4f675046b53b0e4a65d5dec7e2b79 574.log
Output of checks
This bug happens on GitLab-CE Omnibus
Results of GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:env:info)
System information
System: Ubuntu 16.04
Current User: git
Using RVM: no
Ruby Version: 2.3.5p376
Gem Version: 2.6.13
Bundler Version:1.13.7
Rake Version: 12.0.0
Redis Version: 3.2.5
Git Version: 2.13.5
Sidekiq Version:5.0.4
Go Version: unknown
GitLab information
Version: 10.0.1
Revision: 2417795
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: postgresql
URL: https://git.site
HTTP Clone URL: https://git.site
SSH Clone URL: [email protected]
Using LDAP: no
Using Omniauth: no
GitLab Shell
Version: 5.9.0
Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories
Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks
Git: /opt/gitlab/embedded/bin/git
My Issues on Gitlab.com
Upvotes: 1
Views: 379
Reputation: 1326366
This seems expected considering the COPy strategy will copy first the files, before tar/gz them.
As explained in "Backup strategy option":
The default backup strategy is to essentially stream data from the respective data locations to the backup using the Linux command tar and gzip.
This works fine in most cases, but can cause problems when data is rapidly changing.When data changes while tar is reading it, the error file changed as we read it may occur, and will cause the backup process to fail.
To combat this, 8.17 introduces a new backup strategy called copy. The strategy copies data files to a temporary location before calling tar and gzip, avoiding the error.A side-effect is that the backup process with take up to an additional 1X disk space. The process does its best to clean up the temporary files at each stage so the problem doesn't compound, but it could be a considerable change for large installations. This is why the copy strategy is not the default in 8.17.
See lib/backup/files.rb
:
# Copy files from public/files to backup/files
def dump
FileUtils.mkdir_p(Gitlab.config.backup.path)
FileUtils.rm_f(backup_tarball)
if ENV['STRATEGY'] == 'copy'
cmd = %W(cp -a #{app_files_dir} #{Gitlab.config.backup.path})
output, status = Gitlab::Popen.popen(cmd)
So the created timestamp for those copied files will change, making any checksum different.
Upvotes: 0