user13058902
user13058902

Reputation:

Implementation of multithreading using concurrent.future in Python

I have written a python code which convert raw data (STM Microscope) into png format and it run perfectly on my Macbook Pro.

Below is the simplified Python Code:

for root, dirs, file in os.walk(path):
    for dir in dirs:
        fpath = path +'/'+ dir
        os.chdir(fpath)
        spaths=savepath +'/'+ dir
        if os.path.exists(spaths) ==False:
           os.mkdir(spaths)

         for files in glob.glob("*.sm4"):
             for file in files:     
                 data_conv (files, file, spaths)

But it does take 30 - 40 mins for100 files.

Now, I wanted to reduce processing time using multithreading technique (using “concurrent future” library). Was trying to modify python code using YouTube video on “Python Threading Tutorial” as an example.

But I have to pass too many arguments such as “root”, “dirs.”, “file” in the executor.map() method. I don’t know how to resolve this further.

Below this the simplified multithreading Python code

def raw_data (root, dirs, file):
    for dir in dirs:
        fpath = path +'/'+ dir
        os.chdir(fpath)
        spaths=savepath +'/'+ dir
        if os.path.exists(spaths)==False:
            os.mkdir(spaths)

        for files in glob.glob("*.sm4"):
            for file in files:
                data_conv(files, file, spaths)

with concurrent.futures.ThreadPoolExecutor() as executor:
     executor.map(raw_data, root, dirs, file)

NameError: name 'root' is not defined

Any suggestion is appreciated, Thank You.

Upvotes: 0

Views: 534

Answers (2)

TheNoneMan
TheNoneMan

Reputation: 31

First of all, as Iain Shelvington pointed out, data_conv seems like a CPU intensive function, therefore you won't notice improvement with ThreadPoolExecutor, use ProcessPoolExecutor. Second, you have to pass parameters to each instance of function call, i.e. pass lists of arguments to raw_data. Assuming root and file are the same and dirs is a list:

with concurrent.futures.ProcessPoolExecutor() as executor:
    results = executor.map(raw_data, [root]*len(dirs), dirs, [file]*len(dirs)
    for result in results:
        # Collect you results

As a sidenote, you may find working with filesystem more pleasing with pathlib, which is also built-in since Python 3.4

Upvotes: 0

user13058902
user13058902

Reputation:

Thanks for the advice Iain Shelvington & Thenoneman.

Pathlib does reduces the clutter I was having in my code.

"ProcessPoolExecutor" worked in my CPU intense function.

  with concurrent.futures.ProcessPoolExecutor() as executor:
        executor.map(raw_data, os.walk(path))

Upvotes: 1

Related Questions