Reputation: 541
I am trying to use python to create the files needed to run some other software in batch. For part of this I need to produce a text file that loads the needed data files into the software. My problem is that the files I need to enter into this text file are stored in a set of structured folders.
I need to loop over a set of folders (up to 20), which each could contain up to 3 more folders which contain the files I need. The bottom level of the folders contain a set of files needed for each run of the software. The text file should have the path+name of these files printed line by line, add an instruction line and then move to the next set of files from a folder and so on until all of sub level folders have been checked.
Upvotes: 43
Views: 117895
Reputation: 127
This will help to list specific file extension. In my sub-folders i have many files but i am only interested parquet files.
import os
dir = r'/home/output/'
def list_files(dir):
r = []
for root, dirs, files in os.walk(dir):
for name in files:
filepath = root + os.sep + name
if filepath.endswith(".snappy.parquet"):
r.append(os.path.join(root, name))
return r
Upvotes: 3
Reputation: 2133
Use os.walk(). The following will output a list of all files within the subdirectories of "dir". The results can be manipulated to suit you needs:
import os
def list_files(dir):
r = []
subdirs = [x[0] for x in os.walk(dir)]
for subdir in subdirs:
files = os.walk(subdir).next()[2]
if (len(files) > 0):
for file in files:
r.append(os.path.join(subdir, file))
return r
For python 3, change next()
to __next__()
.
Upvotes: 35
Reputation: 1045
Charles' answer is good, but can be improved upon to increase speed and efficiency. Each item produced by os.walk() (See docs) is a tuple of three items. Those items are:
Knowing this, much of Charles' code can be condensed with the modification of a forloop:
import os
def list_files(dir):
r = []
for root, dirs, files in os.walk(dir):
for name in files:
r.append(os.path.join(root, name))
return r
Upvotes: 86