Jek Denys
Jek Denys

Reputation: 115

Processing multiple data files simultaneously using multiple cores

I have multiple data files that I process using python Pandas libraries. Each file is processed one by one, and only one logical processor is used when I look at Task manager (it is at ~95%, and the rest are within 5%)

Is there a way to process data files simultaneously? If so, is there a way to utilize the other logic processors to do that?

(Edits are welcome)

Upvotes: 1

Views: 3679

Answers (2)

Diego
Diego

Reputation: 1282

If your file names are in a list, you could use this code:

from multiprocessing import Process

def YourCode(filename, otherdata):
    # Do your stuff

if __name__ == '__main__':
    #Post process files in parallel
    ListOfFilenames = ['file1','file2', ..., 'file1000']
    ListOfProcesses = []
    Processors = 20 # n of processors you want to use
    #Divide the list of files in 'n of processors' Parts
    Parts = [ListOfFilenames[i:i + Processors] for i in xrange(0, len(ListOfFilenames), Processors)]

    for part in Parts:
        for f in part:
            p = multiprocessing.Process(target=YourCode, args=(f, otherdata))
            p.start()
            ListOfProcesses.append(p)
        for p in ListOfProcesses:
            p.join()

Upvotes: 1

KimKulling
KimKulling

Reputation: 2843

You can process the different files in different threads or in different processes.

The good thing of python is that its framework provides tools for you to do this:

from multiprocessing import Process

def process_panda(filename):
    # this function will be started in a different process
    process_panda_import()
    write_results()

if __name__ == '__main__':
    p1 = Process(target=process_panda, args=('file1',))
    # start process 1
    p1.start() 
    p2 = Process(target=process_panda, args=('file2',))
    # starts process 2
    p2.start() 
    # waits if process 2 is finished
    p2.join()  
    # waits if process 1 is finished
    p1.join()  

The program will start 2 child-processes, which can be used do process your files. Of cource you can do something similar with threads.

You can find the documentation here: https://docs.python.org/2/library/multiprocessing.html

and here:

https://pymotw.com/2/threading/

Upvotes: 0

Related Questions