Reputation: 25
I am trying to iterate through directories and subdirectories to find duplicate files but issue encountered here is script is giving some errors:
Traceback (most recent call last):
File "./fileDupchknew.py", line 29, in <module>
dup_fileremove(dirname)
File "./fileDupchknew.py", line 26, in dup_fileremove
os.remove(filepath)
OSError: [Errno 21] Is a directory: '/tmp/rishabh-test/new-test'
Script:
#!/usr/bin/python
import os
import hashlib
import sys
dirname = sys.argv[1] os.chdir(dirname)
def dup_fileremove(dir):
duplicate = set()
os.chdir(dir)
path=os.getcwd()
print ("The dir is: ", path)
for filename in os.listdir(dir):
filehash = None
filepath=os.path.join(dir, filename)
print("Current file path is: ", filepath)
if os.path.isdir(filepath):
dup_fileremove(filepath)
elif os.path.isfile(filepath):
filehash =hashlib.md5(file(filepath).read()).hexdigest()
if filehash not in duplicate:
duplicate.add(filehash)
else:
os.remove(filepath)
print("removed : ", filepath)
dup_fileremove(dirname)
Upvotes: 2
Views: 5174
Reputation: 90889
Since you do not want to delete directories (as can be seen from comments in question) -
No i don't want to delete directories
If the above is the case, then your issue occurs because you are not creating filehash for the directories. Because when you do not create a filehash for the directory, you get the filehash as None
, and for the first directory, None
is not present in the duplicates
set , so it adds None
to the set. From next directory onwards, it sees that None
is already present in the set()
, hence it tries to use os.remove()
on it causing the issue.
A simple fix would be to check whether filehash
is None
or not , before trying to remove as well as before adding to set. Example -
#!/usr/bin/python
import os
import hashlib
import sys
dirname = sys.argv[1]
os.chdir(dirname)
def dup_fileremove(dir):
duplicate = set()
os.chdir(dir)
path=os.getcwd()
print ("The dir is: ", path)
for filename in os.listdir(dir):
filehash = None
filepath=os.path.join(dir, filename)
print("Current file path is: ", filepath)
if os.path.isdir(filepath):
dup_fileremove(filepath)
elif os.path.isfile(filepath):
filehash =hashlib.md5(file(filepath).read()).hexdigest()
if filehash is not None and filehash not in duplicate:
duplicate.add(filehash)
elif filehash is not None:
os.remove(filepath)
print("removed : ", filepath)
dup_fileremove(dirname)
Upvotes: 1
Reputation: 55469
You're actually lucky you got that error message, otherwise your code would have deleted directories!
The problem is that after control returns from the recursive call to
dup_fileremove(filepath)
it then continues on to
if filehash not in duplicate:
You don't want that!
A simple way to fix it is to put a continue
statement after dup_fileremove(filepath)
.
But a much better fix is to indent the if filehash not in duplicate:
stuff so that it's aligned with the filehash = hashlib.md5(file(filepath).read()).hexdigest()
line.
For example:
#!/usr/bin/python
import os
import hashlib
import sys
def dup_fileremove(dirname):
duplicate = set()
os.chdir(dirname)
path=os.getcwd()
print ("The dirname is: ", path)
for filename in os.listdir(dirname):
filehash = None
filepath=os.path.join(dirname, filename)
print("Current file path is: ", filepath)
if os.path.isdir(filepath):
dup_fileremove(filepath)
elif os.path.isfile(filepath):
filehash =hashlib.md5(file(filepath).read()).hexdigest()
if filehash not in duplicate:
duplicate.add(filehash)
else:
os.remove(filepath)
print("removed : ", filepath)
dirname = sys.argv[1]
os.chdir(dirname)
dup_fileremove(dirname)
I haven't tested this modified version of your code. It looks ok, but I make no guarantees. :)
BTW, it is recommended to not use the file()
class directly to open files. In Python 3, file()
no longer exists, but even in Python the docs have recommended the use of the open()
function since at least Python 2.5, if not earlier.
Upvotes: 1