Reputation: 21280
From what I observe filecmp.dircmp
is recursive, but inadequate for my needs, at least in py2. I want to compare two directories and all their contained files. Does this exist, or do I need to build (using os.walk
, for example). I prefer pre-built, where someone else has already done the unit-testing :)
The actual 'comparison' can be sloppy (ignore permissions, for example), if that helps.
I would like something boolean, and report_full_closure
is a printed report. It also only goes down common subdirs. AFIAC, if they have anything in the left or right dir only those are different dirs. I build this using os.walk
instead.
Upvotes: 48
Views: 87018
Reputation: 3382
Here's a tiny hack without our own recursion and algorithm:
import contextlib
import filecmp
import io
import re
def are_dirs_equal(a, b) -> bool:
stdout = io.StringIO()
with contextlib.redirect_stdout(stdout):
filecmp.dircmp(a, b).report_full_closure()
return re.search("Differing files|Only in", stdout.getvalue()) is None
Upvotes: 1
Reputation: 656
Based on @Mateusz Kobos currently accepted answer, it turns out that the second filecmp.cmpfiles
with shallow=False
is not necessary, so we've removed it. One can get dirs_cmp.diff_files
from the first dircmp
. A common misunderstanding (one that we made as well!) is that dir_cmp
is shallow only and doesn't compare file contents! Turns out that is not true! The meaning of shallow=True
is only to save time, and does not actually consider two files with differing last modification times to be different. If the last modified time is different between two files, it moves into reading each file's contents and comparing their contents. If contents are identical, then it's a match even if last modification date is different! We've added verbose prints here for added clarity. See elsewhere (filecmp.cmp() ignoring differing os.stat() signatures?) if you want to consider differences in st_modtime
to be considered a mismatch. We also changed to use newer pathlib instead of os library.
import filecmp
from pathlib import Path
def compare_directories_recursive(dir1:Path, dir2:Path,verbose=True):
"""
Compares two directories recursively.
First, file counts in each directory are compared.
Second, files are assumed to be equal if their names, size and last modified date are equal (aka shallow=True in python terms)
If last modified date is different, then the contents are compared by reading each file.
Caveat: if the contents are equal and last modified is NOT equal, files are still considered equal!
This caveat is the default python filecmp behavior as unintuitive as it may seem.
@param dir1: First directory path
@param dir2: Second directory path
"""
dirs_cmp = filecmp.dircmp(str(dir1), str(dir2))
if len(dirs_cmp.left_only)>0:
if verbose:
print(f"Should not be any more files in original than in destination left_only: {dirs_cmp.left_only}")
return False
if len(dirs_cmp.right_only)>0:
if verbose:
print(f"Should not be any more files in destination than in original right_only: {dirs_cmp.right_only}")
return False
if len(dirs_cmp.funny_files)>0:
if verbose:
print(f"There should not be any funny files between original and destination. These file(s) are funny {dirs_cmp.funny_files}")
return False
if len(dirs_cmp.diff_files)>0:
if verbose:
print(f"There should not be any different files between original and destination. These file(s) are different {dirs_cmp.diff_files}")
return False
for common_dir in dirs_cmp.common_dirs:
new_dir1 = Path(dir1).joinpath(common_dir)
new_dir2 = Path(dir2).joinpath(common_dir)
if not compare_directories_recursive(new_dir1, new_dir2):
return False
return True
Upvotes: 0
Reputation: 317
To anyone looking for a simple library:
https://github.com/mitar/python-deep-dircmp
DeepDirCmp basically subclasses filecmp.dircmp and shows output identical to diff -qr dir1 dir2
.
Usage:
from deep_dircmp import DeepDirCmp
cmp = DeepDirCmp(dir1, dir2)
if len(cmp.get_diff_files_recursive()) == 0:
print("Dirs match")
else:
print("Dirs don't match")
Upvotes: 0
Reputation: 457
This recursive function seems to work for me:
def has_differences(dcmp):
differences = dcmp.left_only + dcmp.right_only + dcmp.diff_files
if differences:
return True
return any([has_differences(subdcmp) for subdcmp in dcmp.subdirs.values()])
Assuming I haven't overlooked anything, you could just negate the result if you wanna know if directories are the same:
from filecmp import dircmp
comparison = dircmp("dir1", "dir2")
same = not has_differences(comparison)
Upvotes: 3
Reputation: 14791
Here a simple solution with a recursive function :
import filecmp
def same_folders(dcmp):
if dcmp.diff_files or dcmp.left_only or dcmp.right_only:
return False
for sub_dcmp in dcmp.subdirs.values():
if not same_folders(sub_dcmp):
return False
return True
same_folders(filecmp.dircmp('/tmp/archive1', '/tmp/archive2'))
Upvotes: 9
Reputation: 4283
Since a True or False result is all you want, if you have diff
installed:
def are_dir_trees_equal(dir1, dir2):
process = Popen(["diff", "-r", dir1, dir2], stdout=PIPE)
exit_code = process.wait()
return not exit_code
Upvotes: 2
Reputation: 416
This will check if files are in the same locations and if their content is the same. It will not correctly validate for empty subfolders.
import filecmp
import glob
import os
path_1 = '.'
path_2 = '.'
def folders_equal(f1, f2):
file_pairs = list(zip(
[x for x in glob.iglob(os.path.join(f1, '**'), recursive=True) if os.path.isfile(x)],
[x for x in glob.iglob(os.path.join(f2, '**'), recursive=True) if os.path.isfile(x)]
))
locations_equal = any([os.path.relpath(x, f1) == os.path.relpath(y, f2) for x, y in file_pairs])
files_equal = all([filecmp.cmp(*x) for x in file_pairs])
return locations_equal and files_equal
folders_equal(path_1, path_2)
Upvotes: 0
Reputation: 23
Based on python issue 12932 and filecmp documentation you may use following example:
import os
import filecmp
# force content compare instead of os.stat attributes only comparison
filecmp.cmpfiles.__defaults__ = (False,)
def _is_same_helper(dircmp):
assert not dircmp.funny_files
if dircmp.left_only or dircmp.right_only or dircmp.diff_files or dircmp.funny_files:
return False
for sub_dircmp in dircmp.subdirs.values():
if not _is_same_helper(sub_dircmp):
return False
return True
def is_same(dir1, dir2):
"""
Recursively compare two directories
:param dir1: path to first directory
:param dir2: path to second directory
:return: True in case directories are the same, False otherwise
"""
if not os.path.isdir(dir1) or not os.path.isdir(dir2):
return False
dircmp = filecmp.dircmp(dir1, dir2)
return _is_same_helper(dircmp)
Upvotes: 1
Reputation: 2025
filecmp.dircmp
is the way to go. But it does not compare the content of files found with the same path in two compared directories. Instead filecmp.dircmp
only looks at files attributes. Since dircmp
is a class, you fix that with a dircmp
subclass and override its phase3
function that compares files to ensure content is compared instead of only comparing os.stat
attributes.
import filecmp
class dircmp(filecmp.dircmp):
"""
Compare the content of dir1 and dir2. In contrast with filecmp.dircmp, this
subclass compares the content of files with the same path.
"""
def phase3(self):
"""
Find out differences between common files.
Ensure we are using content comparison with shallow=False.
"""
fcomp = filecmp.cmpfiles(self.left, self.right, self.common_files,
shallow=False)
self.same_files, self.diff_files, self.funny_files = fcomp
Then you can use this to return a boolean:
import os.path
def is_same(dir1, dir2):
"""
Compare two directory trees content.
Return False if they differ, True is they are the same.
"""
compared = dircmp(dir1, dir2)
if (compared.left_only or compared.right_only or compared.diff_files
or compared.funny_files):
return False
for subdir in compared.common_dirs:
if not is_same(os.path.join(dir1, subdir), os.path.join(dir2, subdir)):
return False
return True
In case you want to reuse this code snippet, it is hereby dedicated to the Public Domain or the Creative Commons CC0 at your choice (in addition to the default license CC-BY-SA provided by SO).
Upvotes: 27
Reputation: 315
Another solution to Compare the lay out of dir1 and dir2, ignore the content of files
See gist here: https://gist.github.com/4164344
Edit: here's the code, in case the gist gets lost for some reason:
import os
def compare_dir_layout(dir1, dir2):
def _compare_dir_layout(dir1, dir2):
for (dirpath, dirnames, filenames) in os.walk(dir1):
for filename in filenames:
relative_path = dirpath.replace(dir1, "")
if os.path.exists( dir2 + relative_path + '\\' + filename) == False:
print relative_path, filename
return
print 'files in "' + dir1 + '" but not in "' + dir2 +'"'
_compare_dir_layout(dir1, dir2)
print 'files in "' + dir2 + '" but not in "' + dir1 +'"'
_compare_dir_layout(dir2, dir1)
compare_dir_layout('xxx', 'yyy')
Upvotes: 3
Reputation: 1456
def same(dir1, dir2):
"""Returns True if recursively identical, False otherwise
"""
c = filecmp.dircmp(dir1, dir2)
if c.left_only or c.right_only or c.diff_files or c.funny_files:
return False
else:
safe_so_far = True
for i in c.common_dirs:
same_so_far = same_so_far and same(os.path.join(frompath, i), os.path.join(topath, i))
if not same_so_far:
break
return same_so_far
Upvotes: 0
Reputation: 641
Here's an alternative implementation of the comparison function with filecmp
module. It uses a recursion instead of os.walk
, so it is a little simpler. However, it does not recurse simply by using common_dirs
and subdirs
attributes since in that case we would be implicitly using the default "shallow" implementation of files comparison, which is probably not what you want. In the implementation below, when comparing files with the same name, we're always comparing only their contents.
import filecmp
import os.path
def are_dir_trees_equal(dir1, dir2):
"""
Compare two directories recursively. Files in each directory are
assumed to be equal if their names and contents are equal.
@param dir1: First directory path
@param dir2: Second directory path
@return: True if the directory trees are the same and
there were no errors while accessing the directories or files,
False otherwise.
"""
dirs_cmp = filecmp.dircmp(dir1, dir2)
if len(dirs_cmp.left_only)>0 or len(dirs_cmp.right_only)>0 or \
len(dirs_cmp.funny_files)>0:
return False
(_, mismatch, errors) = filecmp.cmpfiles(
dir1, dir2, dirs_cmp.common_files, shallow=False)
if len(mismatch)>0 or len(errors)>0:
return False
for common_dir in dirs_cmp.common_dirs:
new_dir1 = os.path.join(dir1, common_dir)
new_dir2 = os.path.join(dir2, common_dir)
if not are_dir_trees_equal(new_dir1, new_dir2):
return False
return True
Upvotes: 38
Reputation: 21280
Here is my solution: gist
def dirs_same_enough(dir1,dir2,report=False):
''' use os.walk and filecmp.cmpfiles to
determine if two dirs are 'same enough'.
Args:
dir1, dir2: two directory paths
report: if True, print the filecmp.dircmp(dir1,dir2).report_full_closure()
before returning
Returns:
bool
'''
# os walk: root, list(dirs), list(files)
# those lists won't have consistent ordering,
# os.walk also has no guaranteed ordering, so have to sort.
walk1 = sorted(list(os.walk(dir1)))
walk2 = sorted(list(os.walk(dir2)))
def report_and_exit(report,bool_):
if report:
filecmp.dircmp(dir1,dir2).report_full_closure()
return bool_
else:
return bool_
if len(walk1) != len(walk2):
return false_or_report(report)
for (p1,d1,fl1),(p2,d2,fl2) in zip(walk1,walk2):
d1,fl1, d2, fl2 = set(d1),set(fl1),set(d2),set(fl2)
if d1 != d2 or fl1 != fl2:
return report_and_exit(report,False)
for f in fl1:
same,diff,weird = filecmp.cmpfiles(p1,p2,fl1,shallow=False)
if diff or weird:
return report_and_exit(report,False)
return report_and_exit(report,True)
Upvotes: 0
Reputation: 123662
dircmp
can be recursive: see report_full_closure
.
As far as I know dircmp
does not offer a directory comparison function. It would be very easy to write your own, though; use left_only
and right_only
on dircmp
to check that the files in the directories are the same and then recurse on the subdirs
attribute.
Upvotes: 3
Reputation: 9407
The report_full_closure()
method is recursive:
comparison = filecmp.dircmp('/directory1', '/directory2')
comparison.report_full_closure()
Edit: After the OP's edit, I would say that it's best to just use the other functions in filecmp
. I think os.walk
is unnecessary; better to simply recurse through the lists produced by common_dirs
, etc., although in some cases (large directory trees) this might risk a Max Recursion Depth error if implemented poorly.
Upvotes: 6