John
John

Reputation: 771

How to make real parallel programming in Python?

I want to do parallel processing to speed up the task in Python.

I used apply_async but the cpu only consumes 30%. How to fully utilize the cpu?

Below is my code.

import numpy as np
import pandas as pd
import multiprocessing

def calc_score(df, i, j, score):
    score[i,j] = df.loc[i, 'data'] + df.loc[j, 'data']

if __name__ == '__main__':
    df = pd.read_csv('data.csv')
    score = np.zeros([100, 100])
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    for i in range(100):
        for j in range(100):
            pool.apply_async(calc_score, (df, i, j, score))
    pool.close()
    pool.join()

Thank you very much.

Upvotes: 0

Views: 1543

Answers (2)

zvone
zvone

Reputation: 19382

"CPU utilization" should be about performance, i.e. you want to do the job in as little time as possible. There is no generic way to do that. If there was a generic way to optimize software, then there would be no slow software, right?

You seem to be looking for a different thing: spend as much CPU time as possible, so that it does not sit idly. That may seem like the same thing, but is absolutely not.

Anyway, if you want to spend 100% of CPU time, this script will do that for you:

import time
import multiprocessing

def loop_until_t(t):
    while time.time() < t:
        pass

def waste_cpu_for_n_seconds(num_seconds, num_processes=multiprocessing.cpu_count()):
    t0 = time.time()
    t = t0 + num_seconds
    print("Begin spending CPU time (in {} processes)...".format(num_processes))
    with multiprocessing.Pool(num_processes) as pool:
        pool.map(loop_until_t, num_processes*[t])
    print("Done.")

if __name__ == '__main__':
    waste_cpu_for_n_seconds(15)

If, instead, you want your program to run faster, you will not do that with an "illustration for parallel processing", as you call it - you need an actual problem to be solved.

Upvotes: 0

Jay
Jay

Reputation: 11

You can't utilize 100% CPU with pool = multiprocessing.Pool(multiprocessing.cpu_count()) . It starts your worker function on the number of core given by you but also looks for a free core. If you want to utilize maximum CPU with multiprocessing you should use multiprocessing Process class. It keeps spinning new thread. But be aware it will breakdown system if your CPU doesn't have memory to spin new thread.

Upvotes: 1

Related Questions