Reputation:
I have written a python code which convert raw data (STM Microscope) into png format and it run perfectly on my Macbook Pro.
Below is the simplified Python Code:
for root, dirs, file in os.walk(path):
for dir in dirs:
fpath = path +'/'+ dir
os.chdir(fpath)
spaths=savepath +'/'+ dir
if os.path.exists(spaths) ==False:
os.mkdir(spaths)
for files in glob.glob("*.sm4"):
for file in files:
data_conv (files, file, spaths)
But it does take 30 - 40 mins for100 files.
Now, I wanted to reduce processing time using multithreading technique (using “concurrent future” library). Was trying to modify python code using YouTube video on “Python Threading Tutorial” as an example.
But I have to pass too many arguments such as “root”, “dirs.”, “file” in the executor.map() method. I don’t know how to resolve this further.
Below this the simplified multithreading Python code
def raw_data (root, dirs, file):
for dir in dirs:
fpath = path +'/'+ dir
os.chdir(fpath)
spaths=savepath +'/'+ dir
if os.path.exists(spaths)==False:
os.mkdir(spaths)
for files in glob.glob("*.sm4"):
for file in files:
data_conv(files, file, spaths)
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(raw_data, root, dirs, file)
NameError: name 'root' is not defined
Any suggestion is appreciated, Thank You.
Upvotes: 0
Views: 534
Reputation: 31
First of all, as Iain Shelvington pointed out, data_conv
seems like a CPU intensive function, therefore you won't notice improvement with ThreadPoolExecutor
, use ProcessPoolExecutor
. Second, you have to pass parameters to each instance of function call, i.e. pass lists of arguments to raw_data
. Assuming root
and file
are the same and dirs is a list:
with concurrent.futures.ProcessPoolExecutor() as executor:
results = executor.map(raw_data, [root]*len(dirs), dirs, [file]*len(dirs)
for result in results:
# Collect you results
As a sidenote, you may find working with filesystem more pleasing with pathlib, which is also built-in since Python 3.4
Upvotes: 0
Reputation:
Thanks for the advice Iain Shelvington & Thenoneman.
Pathlib does reduces the clutter I was having in my code.
"ProcessPoolExecutor" worked in my CPU intense function.
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.map(raw_data, os.walk(path))
Upvotes: 1