Reputation: 755
I have a master folder which contains many subfolders and files in it, and a child folder.
When a new file is added to the master folder, I need to update the file, if the file exists, or add that file in child folder along with any subfolder if present. However, I don't want to delete any file that is present in child folder but is missing from master folder.
I am calculating MD5 checksum of all files in child and master folder to figure out which files need to be updated/created.
import os
import hashlib
def md5_checksum(filename):
m = hashlib.md5()
with open(filename, 'rb') as f:
for data in iter(lambda: f.read(1024 * 1024), b''):
m.update(data)
return m.hexdigest()
def getListOfFiles(dirName):
listOfFile = os.listdir(dirName)
allFiles = list()
for entry in listOfFile:
fullPath = os.path.join(dirName, entry)
if os.path.isdir(fullPath):
allFiles = allFiles + getListOfFiles(fullPath)
else:
allFiles.append(fullPath+"::"+md5_checksum(fullPath))
return allFiles
local_path=r'C:\test'
incoming_path=os.path.join(local_path,'Incoming') ## Master Folder
existing_path=os.path.join(local_path,'Colors') ## Child Folder
existing_list=getListOfFiles(existing_path)
download_list=getListOfFiles(incoming_path)
existing_md5=[]
for file in existing_list:
existing_md5.append(file.split('::')[1])
for file in download_list:
if file.split('::')[1] not in existing_md5:
print(file.split('::')[0])
However, I'm not sure how to make the subfolder structure same, along with the copying of the files?
Upvotes: 4
Views: 361
Reputation: 755
Turns out, there is a python library for exactly this requirement, called dirsync.
https://pypi.org/project/dirsync/
Upvotes: 3