Reputation: 4810
I'm trying to count the number of files in directory and sub directories, but getting wrong answer.
I have a folder name: train
which contains 10 sub-folders. Each sub-folder contains 900 files.
When I'm counting the files with the following code, I'm getting 0 files which is wrong (need to be 9000).
It seems that isfile
doesn't work.
What am I missing ?
TRAIN_IMAGES_DIR = 'C:\\test\\train\\'
NUM_OF_FILES = 0
for subdir, dirs, files in os.walk(TRAIN_IMAGES_DIR):
for file in files:
print (file)
if os.path.isfile(file):
NUM_OF_FILES = NUM_OF_FILES + 1
else:
print("Folder: ", file)
print (NUM_OF_FILES)
I'm using python 3.7
Upvotes: 1
Views: 242
Reputation: 46301
Pathlib glob is slower but handy when you don't need super speed.
def __get_files(p:PathOrStr):
p = Path(p)
res = [] # result list
return p.glob("**/*") # all the files
gen = __get_files(p=r"C:\Users\dj\")
for _ in gen:
print(_)
Upvotes: 0
Reputation: 1121356
You don't need to use isfile()
at all, because os.walk()
has already separated directories from files for you. When done correctly, the test will be True for all elements of the files
list.
What goes wrong is that each filename is relative, it is just the last element of the path. os.path.isfile()
can only look in the current working directory for such names, and that's not where those files are actually located. You'd have to use os.path.join(subdir, file)
to turn the relative filename to an absolute path.
But, again, don't use isfile()
, that's just double work. os.walk()
has already sorted out the files out for you.
The following will work to count your files:
NUM_OF_FILES = 0
for subdir, dirs, files in os.walk(TRAIN_IMAGES_DIR):
NUM_OF_FILES = NUM_OF_FILES + len(files)
because you only need to know the length of the list here. You can use NUM_OF_FILES += len(files)
too to add the length.
Even shorter, using the sum()
function and a generator expression:
NUM_OF_FILES = sum(len(files) for _, _, files in os.walk(TRAIN_IMAGES_DIR))
If this produces a higher than expected number, then that means you have more files than you expected to have. For example, you may have hidden files (on POSIX systems, any file that starts with .
is hidden from directory listings unless you use ls -a
or similar techniques to reveal them).
You could perhaps filter your files first, on a filename extension; os.path.splitext(file)
can give you a (base, ext)
tuple for that. Or just filter out file[0] == "."
values.
Upvotes: 3