Reputation: 17904
We need a script that will compare two directories of files and for each file that has been altered between directory 1 and directory 2 (added, deleted, modified), need to create a subset of only those modified files.
My first impression is to create a python script to traverse each directory, compute a hash of each file, and if the hash has changed, copy the file over to the new subset of files. Is this a proper approach? Am I neglecting any tools out there which may do this already? I've never used it, but maybe use something like rsync could be used?
Thanks
Edit:
The important part is that I am able to compile a subset of only those files were changed-- so if a only 3 files have changed between versions, I only need those three files copied to a new directory...
Upvotes: 2
Views: 4901
Reputation: 11
Including Subfolders and comparing hashes of the files (>Python 3.11 required)
from os.path import isdir,normpath
from os import sep,walk
import hashlib
rep1=normpath(input('Folder 1: '))
rep2=normpath(input('Folder 2: '))
def hashcheck(fileloc1,fileloc2): # only works from python 3.11 on
if isdir(fileloc1) or isdir(fileloc2):
return False if fileloc1[fileloc1.rfind(sep):]==fileloc2[fileloc2.rfind(sep):] else True
with open(fileloc1,'rb') as f1:
f1hash=hashlib.file_digest(f1,"sha256").hexdigest()
with open(fileloc2,'rb') as f2:
f2hash=hashlib.file_digest(f2,"sha256").hexdigest()
return (f1hash!=f2hash)
R1=[]
R2=[]
for wfolder in list(walk(rep1)):
R1+=(wfolder[0].replace(rep1,'')+sep+item for item in wfolder[2])
for wfolder in list(walk(rep2)):
R2+=(wfolder[0].replace(rep2,'')+sep+item for item in wfolder[2])
vanished = [ pathname for pathname in R1 if pathname not in R2]
appeared = [ pathname for pathname in R2 if pathname not in R1]
modified = [ pathname for pathname in ( f for f in R2 if f in R1)
if hashcheck(rep1+sep+pathname,rep2+sep+pathname)]
print ('vanished==',vanished,'\n')
print ('appeared==',appeared,'\n')
print ('modified==',modified,'\n')
input()
Upvotes: 0
Reputation: 310
I have modified @eyquem answer a bit!
Arguments can be given as
python file.py dir1 dir2
NOTE : sorts on basis of modification time !
#!/usr/bin/python
import os, sys,time
from os.path import getmtime
from os import sep,listdir
ORIG_DIR = sys.argv[1]#orig:-->/root/backup.FPSS/bin/httpd
MODIFIED_DIR = sys.argv[2]#modified-->/FPSS/httpd/bin/httpd
LIST_OF_FILES_IN_ORIG_DIR = listdir(ORIG_DIR)
LIST_OF_FILES_IN_MODIFIED_DIR = listdir(MODIFIED_DIR)
vanished = [ filename for filename in LIST_OF_FILES_IN_ORIG_DIR if filename not in LIST_OF_FILES_IN_MODIFIED_DIR]
appeared = [ filename for filename in LIST_OF_FILES_IN_MODIFIED_DIR if filename not in LIST_OF_FILES_IN_ORIG_DIR]
modified = [ filename for filename in ( f for f in LIST_OF_FILES_IN_MODIFIED_DIR if f in LIST_OF_FILES_IN_ORIG_DIR) if getmtime(ORIG_DIR+sep+filename)<getmtime(MODIFIED_DIR+sep+filename)]
same = [ filename for filename in ( f for f in LIST_OF_FILES_IN_MODIFIED_DIR if f in LIST_OF_FILES_IN_ORIG_DIR) if getmtime(ORIG_DIR+sep+filename)>=getmtime(MODIFIED_DIR+sep+filename)]
def print_list(arg):
for f in arg:
print '----->',f
print 'Total :: ',len(arg)
print '###################################################################################################'
print 'Files which have Vanished from MOD: ',MODIFIED_DIR,' but still present ',ORIG_DIR,' ==>\n',print_list(vanished)
print '-----------------------------------------------------------------------------------------------------'
print 'Files which are Appearing in MOD: ',MODIFIED_DIR,' but not present ',ORIG_DIR,' ==>\n',print_list(appeared)
print '-----------------------------------------------------------------------------------------------------'
print 'Files in MOD: ',MODIFIED_DIR,' which are MODIFIED if compared to ORIG: ',ORIG_DIR,' ==>\n',print_list(modified)
print '-----------------------------------------------------------------------------------------------------'
print 'Files in MOD: ',MODIFIED_DIR,' which are NOT modified if compared to ORIG: ',ORIG_DIR,' ==>\n',print_list(same)
print '###################################################################################################'
Upvotes: 0
Reputation: 27575
It seems to me that you need something as simple as that:
from os.path import getmtime
from os import sep,listdir
rep1 = 'I:\\dada'
rep2 = 'I:\\didi'
R1 = listdir(rep1)
R2 = listdir(rep2)
vanished = [ filename for filename in R1 if filename not in R2]
appeared = [ filename for filename in R2 if filename not in R1]
modified = [ filename for filename in ( f for f in R2 if f in R1)
if getmtime(rep1+sep+filename)!=getmtime(rep2+sep+filename)]
print 'vanished==',vanished
print 'appeared==',appeared
print 'modified==',modified
Upvotes: 3
Reputation: 56390
That is one completely reasonable approach, but you are essentially reinventing rsync. So yes, use rsync.
edit: There's a way to create "difference-only" folders using rsync
Upvotes: 2