Naruto
Naruto

Reputation: 139

How to use multithreading / multiprocessing in place of For loop with pandas dataframe

Currently, I am in a project in which performing validation based on the data provided in the row dataframe, so my current approach is a sequential approach to perform validation.

for index in mt.index():
    #File Reading 
    #performing validation

But I want to implement multithreading/Multiprocessing to enhance my processing time in the current approach it will take more time than expected. Can anyone suggest or help me to how to implement multithreading/multiprocessing which enhances my Script performance.

Upvotes: 0

Views: 97

Answers (1)

Nicolas Busca
Nicolas Busca

Reputation: 1305

You can use the Pool API:

from multiprocessing import Pool

p = Pool()

def validate(index):
   ## do validation work for a given index here

result = p.map(validate, mt.index())

The map function will parallelize the loop over the values of mt.index(). Check out these docs for more options.

Upvotes: 1

Related Questions