Reputation: 95
I have written this code to walk a directory and find files bigger than 300MB.
However, I get a lot of duplicate values and the number of duplicates varies between the files. Can anyone explain this or improve the code for me?
import os
path = 'C:\\Users\\brentond\\Desktop\\Lower Thames Crossing'
for foldername, subfolders, filenames in os.walk(path):
for subfolder in subfolders:
for filename in filenames:
if os.path.getsize(os.path.join(foldername, filename))>300000000:
print(foldername + '\\' + filename)
Upvotes: 2
Views: 618
Reputation: 1364
Skip the subfolders loop.
Walk already goes through subfolders.
Every folder will be foldername exactly once. For each folder, its immediate child files will feature in filenames, each once.
Its immediate child folders will feature in subfolders, each once. You do not need to loop over subfolders unless you want to do something to the folder directly other than check its contents.
Upvotes: 1
Reputation: 24232
You don't have to explore the subfolders yourself, walk does it for you.
From the doc:
os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
(emphasis mine)
So, just do:
import os
path = 'C:\\Users\\brentond\\Desktop\\Lower Thames Crossing'
for foldername, subfolders, filenames in os.walk(path):
for filename in filenames:
if os.path.getsize(os.path.join(foldername, filename))>300000000:
print(foldername + '\\' + filename)
Upvotes: 6