how to make the following for loop use multiple core in Python?

That's a normal Python Code which is running normally

import pandas as pd
dataset=pd.read_csv(r'C:\Users\efthi\Desktop\machine_learning.csv')
registration = pd.read_csv(r'C:\Users\efthi\Desktop\studentVle.csv')


students = list()
result = list()
p=350299
i =749
interactions = 0 
while i <8659:
    student = dataset["id_student"][i]
    print(i)
    i +=1
    while p <1917865:
        if student == registration['id_student'][p]:
            interactions += registration ["sum_click"][p]
        p+=1
    students.insert(i,student)
    result.insert(i,interactions)
    p=0
    interactions = 0


st = pd.DataFrame(students)#create data frame 
st.to_csv(r'C:\Users\efthi\Desktop\ttest.csv', index=False)#insert data frame to csv       

st = pd.DataFrame(result)#create data frame 
st.to_csv(r'C:\Users\efthi\Desktop\results.csv', index=False)#insert data frame to csv       

This is supposed to be running in an even bigger dataset, which I think is more efficient to utilize the multiple cores of my pc

How can I implement it to use all 4 cores?

Upvotes: 1

Views: 4726

Answers (1)

Michael Silverstein
Michael Silverstein

Reputation: 1843

For performing any function in parallel you can something like:

import multiprocessing
import pandas as pd

def f(x):
    # Perform some function
    return y

# Load your data
data = pd.read_csv('file.csv')
# Look at docs to see why "if __name__ == '__main__'" is necessary
if __name__ == '__main__':
    # Create pool with 4 processors
    pool = multiprocessing.Pool(4)
    # Create jobs
    jobs = []
    for group in data['some_group']:
        # Create asynchronous jobs that will be submitted once a processor is ready
        data_for_job = data[data.some_group == group]
        jobs.append(pool.apply_async(f, (data_for_job, )))
    # Submit jobs
    results = [job.get() for job in jobs]
# Combine results
results_df = pd.concat(results)

Regardless of the function your performing, for multiprocessing you:

  1. Create a pool with your desired number of processors
  2. Loop through your data in whatever way you want to chunk it
  3. Create a job with that chunk (using pool.apply_async() <- read the docs about this if it's confusing)
  4. Submit your jobs with job.get()
  5. Combine your results

Upvotes: 2

Related Questions