Reputation: 43
Using Windows 7, I have two folders, a “Master” folder where I work on the files, and a “Backup” folder on a NAS4Free server.
I have over 800 jpg files, totaling 2.6GB, ranging in sizes from 124KB to 16MB.
I frequently “swap” file names, i.e.:
rename 01-020.jpg 99-020.jpg
rename 01-040.jpg 01-020.jpg
rename 99-020.jpg 01-040.jpg
I also add new files - 01-030.jpg - then renumber the set, i.e.:
rename 01-020.jpg 99-020.jpg
rename 01-030.jpg 99-040.jpg
rename 01-040.jpg 99-060.jpg
rename 99-020.jpg 01-020.jpg
rename 99-040.jpg 01-040.jpg
rename 99-060.jpg 01-060.jpg
To keep the Master and Backup folders in sync, I first looked at doing an XCOPY or ROBOCOPY of the entire folder, but that is too time consuming, especially since the vast majority of the files haven’t changed.
I’m trying to come up with a Python 3 solution. I’ve read the documentation on filecmp.cmp(). What worries me is the statement:
“…returns True if they seem equal…” (emphasis mine).
Specifying shallow=False seems to be overkill, causing filecmp to compare the contents of 1,600+ files, when the vast majority of the comparisons will match.
Specifying shallow=True causes filecmp to use the os.stat() function. Running tests with that function, on two files that filecmp returns True, some of the values returned by stat are identical, others are different. Apparently, filecmp doesn’t use ALL the values returned by stat to determine if the files are equal.
So, my question: Under what “real-world” situations will filecmp.cmp(file1, file2, shallow=True) return a false positive or a false negative? Can I trust it?
And, a possible “sub-question”, which specific values returned by os.stat() does filecmp.cmp() use?
(If you’re curious what I’m doing with the files, I discuss it here: https://hikearizona.com/dex2/viewtopic.php?f=78&t=9538)
Upvotes: 2
Views: 1393
Reputation: 4069
The comparison will return true only when size and modified time attribute values are same. it can return false positives only if same exact number of bytes were modified at the same time.
module file references that can be used to confirm above stated :
Excerpt from cmp function implementation (filecmp.py)
s1 = _sig(os.stat(f1))
s2 = _sig(os.stat(f2))
if shallow and s1 == s2:
return True
_sig funcion which is utilized above (filecmp.py):
def _sig(st):
return (stat.S_IFMT(st.st_mode),
st.st_size,
st.st_mtime)
Upvotes: 1