JurgR
JurgR

Reputation: 75

Python Loop from current file

I'm am trying to through files in a directory and find duplicates and delete them. I have 29 000 files in the directory so doing a brute force will take more than a day.

I have filenames that are as follow:

"some_file_name" "some-file-name"

So one name has underscores and the other one has dashes and sometimes they are 2 or three spots apart.

So how do I have my inner loop start at the outer loop's position in the directory and make it check only the next 10?

Here is my brute force code:

import glob, os
os.chdir("C:/Dir/dir")

for file in glob.glob("*"):
    temp = file
    temp = temp.replace("-", " ")
    temp = temp.replace("_", " ")

#How do I start this loop where file is currently at and continue for the next 10 files
for file2 in glob.glob("*"):
    temp2 = file2
    temp2 = temp2.replace("-", " ")
    temp2 = temp2.replace("_", " ")
    if temp == temp2:
        os.remove(file2) 

Upvotes: 2

Views: 154

Answers (2)

mrCarnivore
mrCarnivore

Reputation: 5068

You could use a dictionary and put the "simple name" (without _ or -) as the key and all the real filenames as values:

import glob, os

def extendDictValue(dDict, sKey, uValue):
    if sKey in dDict:
        dDict[sKey].append(uValue)
    else:
        dDict[sKey] = [uValue]


os.chdir("C:/Dir/dir")
filenames_dict = {}
for filename in glob.glob("*"):
    simple_name = filename.replace("-", " ").replace("_", " ")
    extendDictValue(filenames_dict, simple_name, filename)

for simple_name, filenames in filenames_dict.items():
    if len(filenames) > 1:
        filenames.pop(0)
        for filename in filenames:
            os.remove(filename)

Upvotes: 0

Tomalak
Tomalak

Reputation: 338118

From what I understand from your question, you want to delete similarly named files from a directory. I think your approach ("look at the next 10 filenames or so") is too imprecise and too complicated.

The condition is, when both a file some_file_name and a file some-file-name exist, delete one of them.

This can be done very easily by building a list of filenames and for each entry check if a filename with underscores instead of dashes also exists and if it does, delete it.

The following uses a set to do this, because sets have very good look-up characteristics, i.e some_value in some_set is much faster than it would be with lists. It also avoids excessive file-exists checks (like calling os.path.isfile(file)), since we already know all files that exist from building the set.

import glob, os

filenames = {file for file in glob.glob(r"C:\Dir\dir\*")}

for file in filenames:
    delete_candidate = file.replace("-", "_")
    if delete_candidate != file and delete_candidate in filenames:
        os.remove(delete_candidate)
        print("deleted " + delete_candidate)

{x for x in iterable} is a set comprehension, it builds a set from a list of values. It works just like list comprehensions.

Upvotes: 3

Related Questions