erogol
erogol

Reputation: 13624

How can I use os.walk or any other alternative to traverse folders recursively by the natural name order?

In python if I iterate all the folders by os.walk recursievely to find any filr with the defined extension. this is my present code;

def get_data_paths(root_path, ext = '*.jpg'):
    import os
    import fnmatch
    matches = []
    classes = []
    class_names = []
    for root, dirnames, filenames in os.walk(root_path):
      for filename in fnmatch.filter(filenames, ext):
          matches.append(os.path.join(root, filename))
          class_name =  os.path.basename(os.path.dirname(os.path.join(root, filename)))
          if class_name not in class_names:
               class_names.append(class_name)
          classes.append(class_names.index(class_name))

    print "There are ",len(matches), " files're found!!"
    return matches, classes, class_names

However the problem here is, this function visits folders in a strange python order of the folder names. Instead I would like to traverse them through A-Z. How should I modify this code or use any other alternative to do this?

Upvotes: 0

Views: 378

Answers (2)

Mike DeSimone
Mike DeSimone

Reputation: 42835

By default, the topdown parameter to os.walk is True, so a directory triplet is reported before its own directories are descended. The docs state:

the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.

Boldface mine. So all you need to do is something like:

for root, dirnames, filenames in os.walk(root_path):
    dirnames[:] = natsort.natsorted(dirnames)
    # continue with other directory processing...

Since you need to edit the list in place, you need to use the [:] slice notation.


Here's an example of os.walk's operation. Given a directory tree that looks like:

$ ls -RF cm3mm/SAM3/src
Applets/                RTC.cc          SAM3X/
DBGUWriteString.cc  SAM3A/          SMC.cc.in
EEFC.cc             SAM3N/          SoftBoot.cc
Memories.txt        SAM3S/
PIO.cc              SAM3U/

cm3mm/SAM3/src/Applets:
AppletAPI.cc   IntFlash.cc   Main.cc        MessageSink.cc  Runtime.cc

cm3mm/SAM3/src/SAM3A:
Map.txt     Pins.txt

cm3mm/SAM3/src/SAM3N:
Map.txt     Pins.txt

cm3mm/SAM3/src/SAM3S:
Map.txt     Pins.txt

cm3mm/SAM3/src/SAM3U:
Map.txt     Pins.txt

cm3mm/SAM3/src/SAM3X:
Map.txt     Pins.txt

Now, let's see what os.walk does:

>>> import os
>>> for root, dirnames, filenames in os.walk("cm3mm/SAM3/src"):
...     print "-----"
...     print "root =", root
...     print "dirnames =", dirnames
...     print "filenames =", filenames
...
-----
root = cm3mm/SAM3/src
dirnames = ['Applets', 'SAM3A', 'SAM3N', 'SAM3S', 'SAM3U', 'SAM3X']
filenames = ['DBGUWriteString.cc', 'EEFC.cc', 'Memories.txt', 'PIO.cc', 'RTC.cc', 'SMC.cc.in', 'SoftBoot.cc']
-----
root = cm3mm/SAM3/src/Applets
dirnames = []
filenames = ['AppletAPI.cc', 'IntFlash.cc', 'Main.cc', 'MessageSink.cc', 'Runtime.cc']
-----
root = cm3mm/SAM3/src/SAM3A
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3N
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3S
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3U
dirnames = []
filenames = ['Map.txt', 'Pins.txt']
-----
root = cm3mm/SAM3/src/SAM3X
dirnames = []
filenames = ['Map.txt', 'Pins.txt']

Each time through the loop, you get the directories and files for one directory. We know exactly which file belongs to which folder: the files in filenames belong to the folder root.

Upvotes: 2

erogol
erogol

Reputation: 13624

I changed the code like this;

def get_data_paths(root_path, ext = '*.jpg'):
    import os
    import fnmatch
    import natsort  # import this
    matches = []
    classes = []
    class_names = []
    dir_list= natsort.natsorted(list(os.walk(root_path))) # add this
    for root, dirnames, filenames in dir_list:
      for filename in fnmatch.filter(filenames, ext):
          matches.append(os.path.join(root, filename))
          class_name =  os.path.basename(os.path.dirname(os.path.join(root, filename)))
          if class_name not in class_names:
               class_names.append(class_name)
          classes.append(class_names.index(class_name))

    print "There are ",len(matches), " files're found!!"
    return matches, classes, class_names

Upvotes: -1

Related Questions