Reputation: 11
I have to loop through a large number of folder at a given path (My_Path
) (around 60 000).
Then in each folder in My_Path
, I have to check if a file name contains a certain date.
I want to know if there is a faster way than looping one by one using the os
library. (Around 1 hour)
My_Path:
- Folder 1
o File 1
o File 2
o File 3
o …
- Folder 2
- Folder 3
- …
- Folder 60 000
import os
My_Path = r'\\...\...\...\...'
mylist2 = os.listdir(path) # give a list of 60000 element
for folder in mylist2:
mylist = os.listdir(My_Path + folder) # give the list of all files in each folder
for file in mylist:
Check_Function(file)
The actual run takes around one hour and I want to know if there is an optimal solution.
Thanks !!
Upvotes: 1
Views: 1634
Reputation: 13447
As others already suggested you can obtain the list lst
of all files as in this answer. Then you can spin your function with multiprocessing.
import multiprocessing as mp
def parallelize(fun, vec, cores):
with mp.Pool(cores) as p:
res = p.map(fun, vec)
return res
and run
res = parallelize(Check_Function, lst, mp.cpu_count()-1)
Update
Given that I don't think Check_Function
is cpu bounded you can use even more cores.
Upvotes: 2
Reputation: 2576
Try os.walk()
, might be faster:
import os
My_Path = r'\\...\...\...\...'
for path, dirs, files in os.walk(My_Path):
for file in files:
Check_Function(os.path.join(path, file))
If it is not, maybe it's your Check_Function
eating up the cycles.
Upvotes: 2