Ankur
Ankur

Reputation: 33

Python : How to store output of each process during multiprocessing in a list

listJobRunIds = []
    
def run_job(dataset_date):
    command = START_JOB_RUN.format(dataset_date=dataset_date)
    jobRunId =  execute(command=command)
    print("jobRunId:"+jobRunId)
    listJobRunIds.append(jobRunId)
    print(listJobRunIds)
    time.sleep(60)
    return jobRunId

pool = multiprocessing.Pool()
pool = multiprocessing.Pool(processes=3)

listJobRunDates = ['2022-10-01','2022-10-02','2022-10-03','2022-10-04'...]
jobRunId = pool.map(run_job, listJobRunDates)

print(jobRunId)

a. Only 3 process should run in a parallel

b. The execute function runs the job and returns the job run ID.

Date , JobrunId

2022-10-01 , 56728389

2022-10-02 , 56728390

2022-10-03 , 56728391

2022-10-04 , 56728392

c. Actually , once the execution of first pool completes(for first 3 run date and so on) , I want to print their job run IDs in the list listJobRunIds :[56728389,56728390,56728391]

But using the above code ,I am just able to print the last job run ID after every pool

The final print(jobRunId) is giving the list with all the job run IDs like : [56728389,56728390,56728391,56728392,56728393,56728394]

My use case is - once the 1st set of 3 job run complete, then only I want the second set of job run to execute. I am looking for 1st set of values in a list so that I can check the status(RUNNING,SUCCESS) of each jobrunid in a separate function. That new function I will call inside the job_run function after sleep(60)

Upvotes: 0

Views: 853

Answers (1)

tdelaney
tdelaney

Reputation: 77407

If you want to process in groups of 3, split the list and then map. You could count wanted indexes by 3 and use that to pull sections.

pool = multiprocessing.Pool()
pool = multiprocessing.Pool(processes=3)

listJobRunDates = ['2022-10-01','2022-10-02','2022-10-03','2022-10-04'...]

for i in range(0, len(listJobRunDates, 3)):
    job_dates = listJobRunDates[i:i+3]
    print("Processing", job_dates)
    jobRunIds = pool.map(run_job, job_dates)
    for jobRunId in jobRunIds:
        print("Job id", jobRunId)

Upvotes: 2

Related Questions