Reputation: 1914
I feel that assigning files, and folders and doing the += [item] part is a bit hackish. Any suggestions? I'm using Python 3.2
from os import *
from os.path import *
def dir_contents(path):
contents = listdir(path)
files = []
folders = []
for i, item in enumerate(contents):
if isfile(contents[i]):
files += [item]
elif isdir(contents[i]):
folders += [item]
return files, folders
Upvotes: 66
Views: 71331
Reputation: 71
For anyone who's looking for a "Path" way to walk:
from pathlib import Path
p=Path("some_path_you_want_to_walk")
for dirName, subdirList, fileList in p.walk():
print(dirName, subdirList, fileList)
First introduced in Python 3.12: https://docs.python.org/zh-cn/3.13/library/pathlib.html#pathlib.Path.walk
Upvotes: 0
Reputation: 2006
Copy and paste code for those who want to deep walk all nested sub directories:
recursion call
with os.listdir()
:import os
count = 0
def deep_walk(mypath):
global count
for file in os.listdir(mypath):
file_path = os.path.join(mypath, file)
if os.path.isdir(file_path):
deep_walk(file_path)
else:
count += 1
print(file_path)
mypath="/tmp"
deep_walk(mypath)
print(f"Total file count: {count}")
os.walk()
:import os
def walk_dir(mypath):
count = 0
for root, dirs, files in os.walk(mypath):
for file in files:
file_path = os.path.join(root, file)
count += 1
print(file_path)
print(f"Total file count: {count}")
mypath = "/tmp"
walk_dir(mypath)
The difference is that with os.walk()
you won't need to walk every directories of each sub directories mannually, the library will do it for you, no matter how many nested directories you have.
Upvotes: 0
Reputation: 3054
Here is a version that uses os.scandir
and returns a tree structure. Using os.scandir
will return os.DirEntry
objects, which hold information about the path objects in memory, allowing querying of the information about the items without filesystem calls.
import os
def treedir(path):
files = []
folders = {}
for entry in os.scandir(path):
if entry.is_file():
files.append(entry)
elif entry.is_dir():
folders[entry.name] = treedir(entry)
result = {}
if files:
result['files'] = files
if folders:
result['folders'] = folders
return result
Upvotes: 1
Reputation: 466
I like the structure of the result of os.walk()
but prefer pathlib
overall. My lazy solution therefore is simply creating a Path
from each item returned by os.walk()
.
import os
import pathlib
def walk(path='bin'):
for root, dirs, files in os.walk(path):
root = pathlib.Path(root)
dirs = [root / d for d in dirs]
files = [root / f for f in files]
yield root, dirs, files
Upvotes: 0
Reputation: 1
import pathlib
import time
def prune_empty_dirs(path: pathlib.Path):
for current_path in list(path.rglob("*"))[::-1]:
if current_path.is_dir() and not any(current_path.iterdir()):
current_path.rmdir()
while current_path.exists():
time.sleep(0.1)
Upvotes: 0
Reputation: 43840
os.walk
and os.scandir
are great options, however, I've been using pathlib more and more, and with pathlib you can use the .glob()
or .rglob()
(recursive glob) methods:
root_directory = Path(".")
for path_object in root_directory.rglob('*'):
if path_object.is_file():
print(f"hi, I'm a file: {path_object}")
elif path_object.is_dir():
print(f"hi, I'm a dir: {path_object}")
Upvotes: 59
Reputation: 3391
Another solution how to walk a directory tree using the pathlib
module:
from pathlib import Path
for directory in Path('.').glob('**'):
for item in directory.iterdir():
print(item)
The pattern **
matches current directory and all subdirectories, recursively, and the method iterdir
then iterates over each directory's contents. Useful when you need more control when traversing the directory tree.
Upvotes: 4
Reputation: 673
For anyone looking for a solution using pathlib
(python >= 3.4
)
from pathlib import Path
def walk(path):
for p in Path(path).iterdir():
if p.is_dir():
yield from walk(p)
continue
yield p.resolve()
# recursively traverse all files from current directory
for p in walk(Path('.')):
print(p)
# the function returns a generator so if you need a list you need to build one
all_files = list(walk(Path('.')))
However, as mentioned above, this does not preserve the top-down ordering given by os.walk
Upvotes: 44
Reputation: 105
Since Python 3.4 there is new module pathlib
. So to get all dirs and files one can do:
from pathlib import Path
dirs = [str(item) for item in Path(path).iterdir() if item.is_dir()]
files = [str(item) for item in Path(path).iterdir() if item.is_file()]
Upvotes: 3
Reputation: 1604
Since Python >= 3.4
the exists the generator method Path.rglob
.
So, to process all paths under some/starting/path
just do something such as
from pathlib import Path
path = Path('some/starting/path')
for subpath in path.rglob('*'):
# do something with subpath
To get all subpaths in a list do list(path.rglob('*'))
.
To get just the files with sql
extension, do list(path.rglob('*.sql'))
.
Upvotes: 9
Reputation: 791
Instead of the built-in os.walk and os.path.walk, I use something derived from this piece of code I found suggested elsewhere which I had originally linked to but have replaced with inlined source:
import os
import stat
class DirectoryStatWalker:
# a forward iterator that traverses a directory tree, and
# returns the filename and additional file information
def __init__(self, directory):
self.stack = [directory]
self.files = []
self.index = 0
def __getitem__(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
st = os.stat(fullname)
mode = st[stat.ST_MODE]
if stat.S_ISDIR(mode) and not stat.S_ISLNK(mode):
self.stack.append(fullname)
return fullname, st
if __name__ == '__main__':
for file, st in DirectoryStatWalker("/usr/include"):
print file, st[stat.ST_SIZE]
It walks the directories recursively and is quite efficient and easy to read.
Upvotes: 1
Reputation: 10891
If you want to recursively iterate through all the files, including all files in the subfolders, I believe this is the best way.
import os
def get_files(input):
for fd, subfds, fns in os.walk(input):
for fn in fns:
yield os.path.join(fd, fn)
## now this will print all full paths
for fn in get_files(fd):
print(fn)
Upvotes: 4
Reputation: 18109
I've not tested this extensively yet, but I believe
this will expand the os.walk
generator, join dirnames to all the file paths, and flatten the resulting list; To give a straight up list of concrete files in your search path.
import itertools
import os
def find(input_path):
return itertools.chain(
*list(
list(os.path.join(dirname, fname) for fname in files)
for dirname, _, files in os.walk(input_path)
)
)
Upvotes: 0
Reputation: 40150
While googling for the same info, I found this question.
I am posting here the smallest, clearest code which I found at http://www.pythoncentral.io/how-to-traverse-a-directory-tree-in-python-guide-to-os-walk/ (rather than just posting the URL, in case of link rot).
The page has some useful info and also points to a few other relevant pages.
# Import the os module, for the os.walk function
import os
# Set the directory you want to start from
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
Upvotes: 0
Reputation: 114481
Indeed using
items += [item]
is bad for many reasons...
The append
method has been made exactly for that (appending one element to the end of a list)
You are creating a temporary list of one element just to throw it away. While raw speed should not your first concern when using Python (otherwise you're using the wrong language) still wasting speed for no reason doesn't seem the right thing.
You are using a little asymmetry of the Python language... for list objects writing a += b
is not the same as writing a = a + b
because the former modifies the object in place, while the second instead allocates a new list and this can have a different semantic if the object a
is also reachable using other ways. In your specific code this doesn't seem the case but it could become a problem later when someone else (or yourself in a few years, that is the same) will have to modify the code. Python even has a method extend
with a less subtle syntax that is specifically made to handle the case in which you want to modify in place a list object by adding at the end the elements of another list.
Also as other have noted seems that your code is trying to do what os.walk
already does...
Upvotes: 3
Reputation: 23208
Take a look at the os.walk
function which returns the path along with the directories and files it contains. That should considerably shorten your solution.
Upvotes: 47
Reputation: 8055
def dir_contents(path):
files,folders = [],[]
for p in listdir(path):
if isfile(p): files.append(p)
else: folders.append(p)
return files, folders
Upvotes: 3