Reputation: 309
I have the function below which is reading in a previously constructed file into a defaultdict. The file it reads is a csv and contains file sizes and the paths of files.
If more than 1 file matches the filesize of another, then that file is run through a hashing function.
The issue i have is, print is giving me the expected output, where as writing the output to a file is not.
def loadfiles():
'''Loads files and identifies potential duplicates'''
files = defaultdict(list) # uses defaultdict
with open(tmpfile) as csvfile: # reads the file into a dictionary
reader = csv.DictReader(csvfile)
for row in reader:
files[row['size']].append(row['file'])
for key, value in files.items():
if len([item for item in value if item]) > 1:
with open (reportname, 'w') as fr:
writer = csv.writer(fr, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerow(['size','filename','hash'])
for value in value:
writer.writerow([key,value,str(md5Checksum(value))])
print(key, value, str(md5Checksum(value)))
The output to a file is this:
size,filename,hash
43842270,/home/bob/scripts/inprogress_python_scripts/file_dup/testingscript/webwolf-8.0.0.M25.jar,b325dc62d33e2ada19aea07cbcfb237f
43842270,/home/bob/scripts/inprogress_python_scripts/file_dup/testingscript/bkwolf.jar,b325dc62d33e2ada19aea07cbcfb237f
Where as the output to screen from print is this:
128555 /home/bob/scripts/inprogress_python_scripts/file_dup/testingscript/SN0aaa(1).pdf def426a8dee8f226e40df826fcde9904
128555 /home/bob/scripts/inprogress_python_scripts/file_dup/testingscript/SN0aaa(1) (another copy).pdf def426a8dee8f226e40df826fcde9904
128555 /home/bob/scripts/inprogress_python_scripts/file_dup/testingscript/SN0aaa.pdf def426a8dee8f226e40df826fcde9904
128555 /home/bob/scripts/inprogress_python_scripts/file_dup/testingscript/SN0aaa(1) (copy).pdf def426a8dee8f226e40df826fcde9904
43842270 /home/bob/scripts/inprogress_python_scripts/file_dup/testingscript/webwolf-8.0.0.M25.jar b325dc62d33e2ada19aea07cbcfb237f
43842270 /home/b/scripts/inprogress_python_scripts/file_dup/testingscript/bkwolf.jar b325dc62d33e2ada19aea07cbcfb237f
Any ideas / guidance please as to whats wrong?
Upvotes: 0
Views: 48
Reputation: 5104
Using "w" opens the file in write mode, overwriting anything that already exists in the file. Use "a" for append instead.
This will lead to the problem that you will have your header (size,filename,hash) multiple times in there - consider writing this in the very first line and not in a loop.
See, for example: https://www.w3schools.com/python/python_file_write.asp
Upvotes: 1