Reputation: 213
This was a question that appeared in a Python coding competition and was wondering how this can be achieved.
Problem statement:
You have two directories(with possible subdirectories in it). Your script should find out duplicate files by comparing contents of the same filenames in the two root directories
Result: FAIL: If contents of atleast one same filename differs
PASS: Otherwise
Here's a sample figure
/dir1 /dir2
-- file1 -- file1
-- file2 -- fileA
-- file3 -- fileB
-- ....
-- ...
---/subDir1
--file1
--file2
file1 of dir1 contains :- foo bar
file1 of dir2 contains :- foo
Result - Fail
file1 of dir1 contains :- foo bar
file1 of dir2 contains :- foo bar
Result - Pass.
I attempted using hashing by file size, but it was obviously not the way :)
PS: Any scripting language can be used.
Thanks Kelly
Upvotes: 1
Views: 668
Reputation: 25197
Have a look at the filecmp module in the standard library.
Computing hashes is not useful when each file is compared to just one other file. The entire file must be read to compute a hash, and read again to confirm a match. By contrast, a direct comparison can be aborted at first difference.
Upvotes: 1
Reputation: 9402
You can solve this in a tiered manner.
Upvotes: 3