Breno Lehmann
Breno Lehmann

Reputation: 21

Deleting a big amont of files from a specific list of file names (python)

I need to delete a large amount of files from a list with their names (I need to delete 2450 files from a total of 10015 files). But the code I'm using, though it works, is too time-consuming to do the job. It's definitely not optimized to get the job done faster. Anyone have a better idea to deal with this problem?

os.chdir(directoryPath)
for filename in os.listdir('D:\Python\Jupyter\IP_Project\DataBase'):
    for pattern in ['ISIC_0024396*', 'ISIC_0024630*', 'ISIC_0024672*', 
                    'ISIC_0024700*', 'ISIC_0024771*', 'ISIC_0024834*', 
                    'ISIC_0024869*', 'ISIC_0024918*', 'ISIC_0024962*', 
                    'ISIC_0024998*', 'ISIC_0025005*', 'ISIC_0025040*', 
                    'ISIC_0025046*', 'ISIC_0025064*', 'ISIC_0025073*', 
                    'ISIC_0025112*', 'ISIC_0025152*', 'ISIC_0025168*', 
                    'ISIC_0025170*', 'ISIC_0025193*', 'ISIC_0025208*', 
                    'ISIC_0025231*', 'ISIC_0025297*', 'ISIC_0025322*',
                    'ISIC_0034319*', 'ISIC_0034320*']:
        if fnmatch.fnmatch(filename, pattern):
             os.remove(filename)

Note: I decreases the number of file names in the above code to better exemplify my idea. But as I said above, these are 2450 filenames.

Thanks for tips !

Upvotes: 1

Views: 89

Answers (2)

Breno Lehmann
Breno Lehmann

Reputation: 21

Thanks guys for the tips. For my problem specifically I can solve it in a much simpler way by removing the files directly as you indicated.

os.chdir(directory_path)
    for filename in repeated_images:
    os.remove(filename)

Upvotes: 1

James Kent
James Kent

Reputation: 5933

as stated in my comment above you are currently continuing to check for further matches after you find the first one, even though in this instance you won't and as the file has been removed it's not useful to, by breaking after a match:

os.chdir(directoryPath)
for filename in os.listdir('D:\Python\Jupyter\IP_Project\DataBase'):
    for pattern in ['ISIC_0024396*', 'ISIC_0024630*', 'ISIC_0024672*', 
                    'ISIC_0024700*', 'ISIC_0024771*', 'ISIC_0024834*', 
                    'ISIC_0024869*', 'ISIC_0024918*', 'ISIC_0024962*', 
                    'ISIC_0024998*', 'ISIC_0025005*', 'ISIC_0025040*', 
                    'ISIC_0025046*', 'ISIC_0025064*', 'ISIC_0025073*', 
                    'ISIC_0025112*', 'ISIC_0025152*', 'ISIC_0025168*', 
                    'ISIC_0025170*', 'ISIC_0025193*', 'ISIC_0025208*', 
                    'ISIC_0025231*', 'ISIC_0025297*', 'ISIC_0025322*',
                    'ISIC_0034319*', 'ISIC_0034320*']:
        if fnmatch.fnmatch(filename, pattern):
             os.remove(filename)
             break # break now that we matched and move on to next file

you in theory reduce the time taken to process all of them by roughly half (assuming an even distribution of names per pattern)

Upvotes: 1

Related Questions