Reputation: 33
listJobRunIds = []
def run_job(dataset_date):
command = START_JOB_RUN.format(dataset_date=dataset_date)
jobRunId = execute(command=command)
print("jobRunId:"+jobRunId)
listJobRunIds.append(jobRunId)
print(listJobRunIds)
time.sleep(60)
return jobRunId
pool = multiprocessing.Pool()
pool = multiprocessing.Pool(processes=3)
listJobRunDates = ['2022-10-01','2022-10-02','2022-10-03','2022-10-04'...]
jobRunId = pool.map(run_job, listJobRunDates)
print(jobRunId)
a. Only 3 process should run in a parallel
b. The execute function runs the job and returns the job run ID.
Date , JobrunId
2022-10-01 , 56728389
2022-10-02 , 56728390
2022-10-03 , 56728391
2022-10-04 , 56728392
c. Actually , once the execution of first pool completes(for first 3 run date and so on) , I want to print their job run IDs in the list listJobRunIds :[56728389,56728390,56728391]
But using the above code ,I am just able to print the last job run ID after every pool
The final print(jobRunId)
is giving the list with all the job run IDs like :
[56728389,56728390,56728391,56728392,56728393,56728394]
My use case is - once the 1st set of 3 job run complete, then only I want the second set of job run to execute. I am looking for 1st set of values in a list so that I can check the status(RUNNING,SUCCESS) of each jobrunid in a separate function. That new function I will call inside the job_run function after sleep(60)
Upvotes: 0
Views: 853
Reputation: 77407
If you want to process in groups of 3, split the list and then map. You could count wanted indexes by 3 and use that to pull sections.
pool = multiprocessing.Pool()
pool = multiprocessing.Pool(processes=3)
listJobRunDates = ['2022-10-01','2022-10-02','2022-10-03','2022-10-04'...]
for i in range(0, len(listJobRunDates, 3)):
job_dates = listJobRunDates[i:i+3]
print("Processing", job_dates)
jobRunIds = pool.map(run_job, job_dates)
for jobRunId in jobRunIds:
print("Job id", jobRunId)
Upvotes: 2