Reputation: 1065
I would like to find a way to traverse a directory tree in a bottom-up fashion using Python. The goal would be to find a single directory that would be an unknown distance above or below the current directory.
I made a diagram that I hope makes my question more clear:
The red box is the starting point and the green boxes are possible locations of the destination folder, of which only one needs to be found, not both.
If the destination box is /One/_One/_One, then I would expect the script to go up to /One, then traverse all the way down into /One/_One/_One.
If the destination folder is /Three, then I would expect the script to do the same as above, and then proceed to /Two and /Two/_Two, not finding it, and then finally reaching /Three. Alternatively, after checking all of /One, it would go to / and then find /Three, skipping the traversal of /Two.
Any help would be appreciated. I have been looking at all of the os.path and os.walk methods, but haven't found my solution yet.
Upvotes: 1
Views: 5409
Reputation: 9075
The key to this lies in the following documentation for os.walk
:
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames
Armed with that, you simply have to think of this problem as a tree-search where you start with the root at the first node, and every time you don't find the solution, you pop up a level and perform a tree search again, removing the node that was the root of your last search when you get there.
Let's say I have the following:
start_path = 'ABC0123/Comp/Scripts'
searching_for ='Some_File'
I can do the following:
last_root = start_path
current_root = start_path
found_path = None
while found_path is None and current_root:
pruned = False
for root, dirs, files in os.walk(current_root):
if not pruned:
try:
# Remove the part of the tree we already searched
del dirs[dirs.index(os.path.basename(last_root))]
pruned = True
except ValueError:
pass
if searching_for in files:
# found the file, stop
found_path = os.path.join(root, searching_for)
break
# Otherwise, pop up a level, search again
last_root = current_root
current_root = os.path.dirname(last_root)
The first iteration this should search the 'ABC0123/Comp/Scripts'
directory. Then, if it doesn't find 'Some_File'
, it will search the 'ABC0123/Comp'
directory, skipping the 'Scripts' directory. And then it will search the 'ABC0123'
directory, skipping 'Comp'
and everything underneath it.
Here's some example output of the traversal. CR
is current_root
, LR
is last_root
, and Exploring
is the root
in the current step of the walk
. In this case the file was in ABC0123/Paint/Output
:
CR: 'ABC0123/Comp/Scripts/', LR: 'ABC0123/Comp/Scripts/'
Exploring: 'ABC0123/Comp/Scripts/'
CR: 'ABC0123/Comp/Scripts', LR: 'ABC0123/Comp/Scripts/'
Exploring: 'ABC0123/Comp/Scripts'
CR: 'ABC0123/Comp', LR: 'ABC0123/Comp/Scripts'
Exploring: 'ABC0123/Comp'
Exploring: 'ABC0123/Comp/Output'
CR: 'ABC0123', LR: 'ABC0123/Comp'
Exploring: 'ABC0123'
Exploring: 'ABC0123/Lighting'
Exploring: 'ABC0123/Lighting/Output'
Exploring: 'ABC0123/Paint'
Exploring: 'ABC0123/Paint/Output'
>>> found_path
'ABC0123/Paint/Output/Some_File'
Also note that it's not really clear if you're searching for a directory or a file. My code assumes that latter, but if it's the former simply change:
if searching_for in files:
to
if searching_for in dirs:
But note that in both cases, it's assuming there is a single, globally (within the max tree depth) unique file/dir you are searching for, or that the first instance of that file/dir you come across is the one you're looking for. For example, as written you couldn't search for 'Paint/Output' specifically. You should be able to pretty easily figure out how to modify the search conditions to allow this however.
Upvotes: 7