Reputation: 95
I have a *.txt file and I keep some codes of some files in it. Format is the following:
code1, a/b/c/1.jpg
code2, a/b/c/2.jpg
code1, a/b/c/3.jpg
code2, a/b/d/4.jpg
code3, a/b/d/5.jpg
My purpose is finding the files in the same folder with the same codes (duplicates). All the file names are different. If the same code occurs in different folder like code2, a/b/c/ and code2, a/b/d/
I want to skip it. Right now I have the following code which searches a specific code in the whole *.txt document:
reader = csv.reader(csvfile)
dataDict = dict()
for row in reader:
if any (row):
if row[0] in dataDict.keys():
dataDict[row[0]].append(row[1])
else:
dataDict[row[0]] = [row[1]]
But this gives me the duplicates in different folders. However, I want to find the duplicate files exactly in the same folder.
Edit:The title is not clear. I did not know how to describe this in the title.
Upvotes: 2
Views: 67
Reputation: 276
reader = csv.reader(csvfile)
dataDict = dict()
for row in reader:
if any (row):
code, filename = row
_, dir_path = filename.rsplit('/', 1)
if dir_path not in dataDict.keys():
dataDict[dir_path] = {}
if code not in dataDict[dir_path]:
dataDict[dir_path][code] = []
dataDict[dir_path][code].append(filename)
duplicates = []
for k_dir, v in dataDict.items():
for _, paths in dataDict[k_dir].items():
if len(paths) > 1:
duplicates.append(paths)
return duplicates
First part is sorting codes and files together
The second part is detecting duplicates and returning them
Upvotes: 2