Reputation: 95
I was watching an online course in Udacity about Deep learning and the concept was a simple classification for a not_Mnist dataset.Everything was explained way too good but i am a bit confused with parts of the code given.I would appreciate if you have the time to give me a hand !
For example we have an 'notMNIST_large.tar.gz' file
.
So at first we remove .tar.gz
and the root is root = notMNIST_large
. After that we check if there is already a directory with this name.If not we extract the subfolders from 'notMNIST_large.tar.gz' file
and this is where i get a bit confused...
num_classes = 10
np.random.seed(133)
def maybe_extract(filename, force=False):
root = os.path.splitext(os.path.splitext(filename)[0])[0] # remove .tar.gz
if os.path.isdir(root) and not force:
# You may override by setting force=True.
print('%s already present - Skipping extraction of %s.' % (root, filename))
else:
print('Extracting data for %s. This may take a while. Please wait.' % root)
tar = tarfile.open(filename)
sys.stdout.flush()
tar.extractall(data_root)
tar.close()
data_folders = [
os.path.join(root, d) for d in sorted(os.listdir(root))
if os.path.isdir(os.path.join(root, d))]
if len(data_folders) != num_classes:
raise Exception(
'Expected %d folders, one per class. Found %d instead.' % (
num_classes, len(data_folders)))
print(data_folders)
return data_folders
train_folders = maybe_extract(train_filename)
test_folders = maybe_extract(test_filename)
So i would like if possible an explanation for this part
data_folders = [
os.path.join(root, d) for d in sorted(os.listdir(root))
if os.path.isdir(os.path.join(root, d))]
if len(data_folders) != num_classes:
raise Exception(
'Expected %d folders, one per class. Found %d instead.' % (
num_classes, len(data_folders)))
Upvotes: 1
Views: 662
Reputation: 189497
It collects a list of subdirectories and checks that there are the number it expected.
data_folders = [thing(d) for d in something() if predicate(d)]
is a list comprehension which loops over the result of something()
and collects those items for which predicate
is True
. It applies thing()
to those entries and collects the resulting list in data_folders
.
Here, something
is the listing of the files in the current directory, and predicate
checks that the item is a directory (and not, for example, a regular file); thing
is os.path.join(root,d)
i.e. we add back the root
directory in front of the extracted entries.
So, in this case, the code checks that the number of subdirectories is the same as the number of classes (presumably each subdirectory contains a class).
Upvotes: 2