Python Loop from current file

Question

I'm am trying to through files in a directory and find duplicates and delete them. I have 29 000 files in the directory so doing a brute force will take more than a day.

I have filenames that are as follow:

"some_file_name" "some-file-name"

So one name has underscores and the other one has dashes and sometimes they are 2 or three spots apart.

So how do I have my inner loop start at the outer loop's position in the directory and make it check only the next 10?

Here is my brute force code:

import glob, os
os.chdir("C:/Dir/dir")

for file in glob.glob("*"):
    temp = file
    temp = temp.replace("-", " ")
    temp = temp.replace("_", " ")

#How do I start this loop where file is currently at and continue for the next 10 files
for file2 in glob.glob("*"):
    temp2 = file2
    temp2 = temp2.replace("-", " ")
    temp2 = temp2.replace("_", " ")
    if temp == temp2:
        os.remove(file2)

Tomalak · Accepted Answer

From what I understand from your question, you want to delete similarly named files from a directory. I think your approach ("look at the next 10 filenames or so") is too imprecise and too complicated.

The condition is, when both a file some_file_name and a file some-file-name exist, delete one of them.

This can be done very easily by building a list of filenames and for each entry check if a filename with underscores instead of dashes also exists and if it does, delete it.

The following uses a set to do this, because sets have very good look-up characteristics, i.e some_value in some_set is much faster than it would be with lists. It also avoids excessive file-exists checks (like calling os.path.isfile(file)), since we already know all files that exist from building the set.

import glob, os

filenames = {file for file in glob.glob(r"C:\Dir\dir\*")}

for file in filenames:
    delete_candidate = file.replace("-", "_")
    if delete_candidate != file and delete_candidate in filenames:
        os.remove(delete_candidate)
        print("deleted " + delete_candidate)

{x for x in iterable} is a set comprehension, it builds a set from a list of values. It works just like list comprehensions.

Python Loop from current file

Answers (2)

Related Questions