Reputation: 89
Why does the root element returned from os.walk() show / as the directory separator but os.sep (or os.path.sep) shows \ on Win10?
I'm just trying to create the complete path for a set of files in a folder as follows:
import os
base_folder = "c:/data/MA Maps"
for root, dirs, files in os.walk(base_folder):
for f in files:
if f.endswith(".png") and f.find("_N") != -1:
print(os.path.join(root, f))
print(os.path.sep)
Here's what I get as an output:
c:/data/MA Maps\Map_of_Massachusetts_Nantucket_County.png
c:/data/MA Maps\Map_of_Massachusetts_Norfolk_County.png
\
I understand that some of python's library functions (like open()) will work with mixed path separators (at least on Windows) but relying on that hack really can't be trusted across all libraries. It just seems like the items returned from os.walk() and os.path (.sep or .join()) should yield consistent results based on the operating system being used. Can anyone explain why this inconsistency is happening?
P.S. - I know there is a more consistent library for working with file paths (and lots of other file manipulation) called pathlib that was introduced in python 3.4 and it does seem to fix all this. If your code is being used in 3.4 or beyond, is it best to use pathlib methods to resolve this issue? But if your code is targeted for systems using python before 3.4, what is the best way to address this issue?
Here's a good basic explanation of pathlib: Python 3 Quick Tip: The easy way to deal with file paths on Windows, Mac and Linux
Here's my code & result using pathlib:
import os
from pathlib import Path
# All of this should work properly for any OS. I'm running Win10.
# You can even mix up the separators used (i.e."c:\data/MA Maps") and pathlib still
# returns the consistent result given below.
base_folder = "c:/data/MA Maps"
for root, dirs, files in os.walk(base_folder):
# This changes the root path provided to one using the current operating systems
# path separator (/ for Win10).
root_folder = Path(root)
for f in files:
if f.endswith(".png") and f.find("_N") != -1:
# The / operator, when used with a pathlib object, just concatenates the
# the path segments together using the current operating system path separator.
print(root_folder / f)
c:\data\MA Maps\Map_of_Massachusetts_Nantucket_County.png
c:\data\MA Maps\Map_of_Massachusetts_Norfolk_County.png
This can even be done more succinctly using only pathlib and list comprehension (with all path separators correctly handled per OS used):
from pathlib import Path
base_folder = "c:/data/MA Maps"
path = Path(base_folder)
files = [item for item in path.iterdir() if item.is_file() and
str(item).endswith(".png") and
(str(item).find("_N") != -1)]
for file in files:
print(file)
c:\data\MA Maps\Map_of_Massachusetts_Nantucket_County.png
c:\data\MA Maps\Map_of_Massachusetts_Norfolk_County.png
This is very Pythonic and at least I feel it is quite easy to read and understand. .iterdir() is really powerful and makes dealing with files and dirs reasonably easy and in a cross-platform way. What do you think?
Upvotes: 0
Views: 862
Reputation: 104792
The os.walk
function always yields the initial part of the dirpath
unchanged from what you pass in to it. It doesn't try to normalize the separators itself, it just keeps what you've given it. It does use the system-standard separators for the rest of the path, as it combines each subdirectory's name to the root directory with os.path.join
. You can see the current version of the implementation of the os.walk
function in the CPython source repository.
One option for normalizing the separators in your output is to normalize the base path you pass in to os.walk
, perhaps using pathlib
. If you normalize the initial path, all the output should use the system path separators automatically, since it will be the normalized path that will be preserved through the recursive walk, rather than the non-standard one. Here's a very basic transformation of your first code block to normalize the base_folder
using pathlib
, while preserving all the rest of the code, in its simplicity. Whether it's better than your version using more of pathlib
's features is a judgement call that I'll leave up to you.
import os
from pathlib import Path
base_folder = Path("c:/data/MA Maps") # this will be normalized when converted to a string
for root, dirs, files in os.walk(base_folder):
for f in files:
if f.endswith(".png") and f.find("_N") != -1:
print(os.path.join(root, f))
Upvotes: 4