Reputation: 12487
This is the scenario. I want to be able to backup the contents of a folder using a python script. However, I want my backups to be stored in a zipped format, possibly bz2.
The problem comes from the fact that I don’t want to bother backing up the folder if the contents in the “current” folder are exactly the same as what is in my most recent backup.
My process will be like this:
Can anyone recomment the most reliable and simple way of completing step2? Do I have to unzip the contents of the backup and store in a temp directory to do a comparison or is there a more elegant way of doing this? Possibly to do with modified date?
Upvotes: 8
Views: 6632
Reputation: 336
I use this script to create compress backup of a directory only when the directory contents has changed after last backup.
I use external md5 file to store the digest of the backup file and I check it to detect directory changes.
import hashlib
import tarfile
import bz2
import cStringIO
import os
def backup_dir(dirname, backup_path):
fobj = cStringIO.StringIO()
t = tarfile.open(mode='w',fileobj=fobj)
t.add(dirname)
t.close()
buf = fobj.getvalue()
new_md5 = hashlib.md5(buf).digest()
if os.path.isfile(backup_path + '.md5'):
old_md5 = open(backup_path + '.md5').read()
else:
old_md5 = ''
if new_md5 <> old_md5:
open(backup_path, 'wb').write(bz2.compress(buf))
open(backup_path + '.md5', 'wb').write(new_md5)
print 'backup done!'
else:
print 'nothing to do'
Upvotes: 2
Reputation: 7050
Zip files contain CRC32 checksums and you can read them with the python zipfile module: http://docs.python.org/2/library/zipfile.html. You can get a list of ZipInfo objects with CRC members from ZipFile.infolist(). There are also modification dates in the ZipInfo object.
You can compare the zip checksum with calculated checksums for the unpacked files. You need to read the unpacked files but you avoid having to decompress everything.
CRC32 is not a cryptographic checksum but it should be enough if all you need is to check for changes.
This holds for zip files. Other archive formats (like tar.bz2) might not contain such easily-accessible metadata.
Upvotes: 6
Reputation: 1816
You could also try the following process:
1) Initiate backup
2) Run backup
3) Compare both compressed files:
import filecmp
filecmp.cmp(Compressed_new_file, Compressed_old_file, shallow=True)
4) If same – delete new backup file then "complete"
5) Else “complete”
NOTE: In case you need to check just the time between the modifications, you can have a look at this documentation
Rather than decompressing the folder and comparing individual files, I think it might be easier to compare the compressed files. Overall I feel (ok, its just an intuition :D) this will be better in case there is a high probability that the contents of the folder changes in between the times you run the script
Upvotes: 1
Reputation: 2076
Rsync will automatically detect and only copy modified files, but seeing as you want to bzip the results, you still need to detect if anything has changed.
How about you output the directory listing (including time stamps) to a text file alongside your archive. The next time you diff
the current directory structure against this stored text. You can grep differences out and pipe this file list to rsync to include
those changed files.
Upvotes: 1