Dan
Dan

Reputation: 95

Find files bigger than 300MB using os.walk in Python?

I have written this code to walk a directory and find files bigger than 300MB.

However, I get a lot of duplicate values and the number of duplicates varies between the files. Can anyone explain this or improve the code for me?

import os

path = 'C:\\Users\\brentond\\Desktop\\Lower Thames Crossing'
for foldername, subfolders, filenames in os.walk(path):
    for subfolder in subfolders:
        for filename in filenames:
            if os.path.getsize(os.path.join(foldername, filename))>300000000:
                print(foldername + '\\' + filename)

Upvotes: 2

Views: 618

Answers (2)

Josiah
Josiah

Reputation: 1364

Skip the subfolders loop.

Walk already goes through subfolders.

Every folder will be foldername exactly once. For each folder, its immediate child files will feature in filenames, each once.

Its immediate child folders will feature in subfolders, each once. You do not need to loop over subfolders unless you want to do something to the folder directly other than check its contents.

Upvotes: 1

Thierry Lathuille
Thierry Lathuille

Reputation: 24232

You don't have to explore the subfolders yourself, walk does it for you.

From the doc:

os.walk(top, topdown=True, onerror=None, followlinks=False)

Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

(emphasis mine)

So, just do:

import os

path = 'C:\\Users\\brentond\\Desktop\\Lower Thames Crossing'
for foldername, subfolders, filenames in os.walk(path):
    for filename in filenames:
        if os.path.getsize(os.path.join(foldername, filename))>300000000:
            print(foldername + '\\' + filename)

Upvotes: 6

Related Questions