Reputation: 49
I need to copy files from a temp directory to new sub-directories. It must be done in a function. I defined new directories, checked if they exist. The source directory contains multiple files. As these are temp directories I think I don't use relative paths. I have a function that keeps giving me this error:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-24-c7f9666ed031> in <module>()
38
39 split_size = .9
---> 40 split_data(CAT_SOURCE_DIR, TRAINING_CATS_DIR, TESTING_CATS_DIR, split_size)
41 split_data(DOG_SOURCE_DIR, TRAINING_DOGS_DIR, TESTING_DOGS_DIR, split_size)
42
2 frames
/usr/lib/python3.6/genericpath.py in getsize(filename)
48 def getsize(filename):
49 """Return the size of a file, reported by os.stat()."""
---> 50 return os.stat(filename).st_size
51
52
FileNotFoundError: [Errno 2] No such file or directory: '3913.jpg'
This is my function. I even tried to add a line that tells the function not to copy a file that doesn't exist.
def split_data(SOURCE, TRAINING, TESTING, SPLIT_SIZE):
source_files = [f for f in os.listdir(SOURCE) if os.path.getsize(f) > 0]
source_files = [f for f in os.listdir(SOURCE) if os.path.exists(f)]
random.shuffle(source_files)
total = len(source_files)
to_training = source_files[0: int(total * SPLIT_SIZE)]
to_test = source_files[int(total * SPLIT_SIZE):]
for f in to_training:
copyfile(os.path.join(SOURCE, f), TRAINING)
for f in to_test:
copyfile(os.path.join(SOURCE, f), TESTING)
assert len(source_files) == len(to_training) + len(to_test)
When I check the length of the source directory it's full of images, then I check if my folders have been created properly with os.isdir() and so they are. I have no idea how to solve this issue. Please help.
Upvotes: 1
Views: 763
Reputation: 4598
You are not passing a valid path to os.path.getsize
, just the name of a file in some directory. From the documentation of the function os.listdir
:
Return a list containing the names of the entries in the directory given by path.
These names are not paths. To get a proper path to the file, you have to join onto each element the argument you pass to os.listdir
:
source_files = [f for f in os.listdir(SOURCE) if os.path.getsize(os.path.join(SOURCE, f)) > 0]
However, because you are basically calling os.stat
on each file in the directory, it will be much more efficient to use os.scandir
instead. From the documentation:
Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory.
That means you can do:
source_files = [f.path for f in os.scandir(SOURCE) if f.stat().st_size > 0]
This form will also set source_files
to a list of paths to files, so you don't have to call os.path.join
later. If you really just want the file name, you can replace f.path
with f.name
in the comprehension. You can find more documentation about what f
is in the documentation for os.DirEntry
.
Upvotes: 2