jonathan eslava
jonathan eslava

Reputation: 95

Finding values starting with a specific piece of string

I have a *.txt file and I keep some codes of some files in it. Format is the following:

code1, a/b/c/1.jpg
code2, a/b/c/2.jpg
code1, a/b/c/3.jpg
code2, a/b/d/4.jpg
code3, a/b/d/5.jpg

My purpose is finding the files in the same folder with the same codes (duplicates). All the file names are different. If the same code occurs in different folder like code2, a/b/c/ and code2, a/b/d/ I want to skip it. Right now I have the following code which searches a specific code in the whole *.txt document:

reader = csv.reader(csvfile)  
dataDict = dict()
for row in reader:
    if any (row):
        if row[0] in dataDict.keys():
            dataDict[row[0]].append(row[1])
        else:
            dataDict[row[0]] = [row[1]]

But this gives me the duplicates in different folders. However, I want to find the duplicate files exactly in the same folder.

Edit:The title is not clear. I did not know how to describe this in the title.

Upvotes: 2

Views: 67

Answers (1)

Landar
Landar

Reputation: 276

reader = csv.reader(csvfile)  
dataDict = dict()
for row in reader:
    if any (row):
        code, filename = row
        _, dir_path = filename.rsplit('/', 1)
        if dir_path not in dataDict.keys():
            dataDict[dir_path] = {}
        if code not in dataDict[dir_path]:
            dataDict[dir_path][code] = []
        dataDict[dir_path][code].append(filename)
duplicates = []
for k_dir, v in dataDict.items():
    for _, paths in dataDict[k_dir].items():
        if len(paths) > 1:
            duplicates.append(paths)
return duplicates

First part is sorting codes and files together

The second part is detecting duplicates and returning them

Upvotes: 2

Related Questions