amarjeet kushwaha
amarjeet kushwaha

Reputation: 1

Google Cloud Run Jobs Task Parallelism Not Working?

I wanted to learn about cloud run job's task parallelism and , so I created a job and set number of task =10, and , to each container I am giving 8Gb of Memory and 8cpus's.

I have set task parallelism to "Run as many tasks concurrently as possible".

But, when I run this job, it run's 10 task one after other , not 10 task in parallel. Why , its not running 10 tasks in parallel.

Cloud run jobs running 10 task one after other

I have file of 1.6 gb,and below is my code, which I am running on cloud run instances,to process file and load data to BQ, just for testing.

import numpy as np
import os
import pandas as pd


project_id = os.environ.get("PROJECT_ID")
task_index = int(os.environ.get("CLOUD_RUN_TASK_INDEX")) 
nb_task = int(os.environ.get("CLOUD_RUN_TASK_COUNT"))
print(nb_task)

df_original=pd.read_csv("gs://dataproc-input-321/train.csv")
df_len=len(df_original)
print(df_len)


batch_size=df_len//nb_task
print("--------------batch--------------")
print(batch_size)
# print("batch size {}".format(str(batch_size)))
print(f"batch size, {batch_size}!")


start_row_no = int(batch_size * task_index)
end_row_no   = int(batch_size * (task_index + 1) - 1)


print(f"For task_id {task_index} , start is {start_row_no} and end is {end_row_no}")
df_sliced=df_original.iloc[start_row_no:end_row_no]
del(df_original)


df_sliced["type"]=np.where(df_sliced["PRODUCT_LENGTH"]%2==0,"even","odd")

df_even=df_sliced[df_sliced["type"] == 'even']
df_odd=df_sliced[df_sliced["type"] == 'odd']


df_even.to_gbq('output_dataset.amazon_even', 
                 project_id=project_id,
                 if_exists='append'
                 )
df_odd.to_gbq('output_dataset.amazon_odd', 
                 project_id=project_id,
                 if_exists='append'
                 )

So, I think , 8GB memory and 8 cpu are enough for processing 1.6 GB. But,why its not running everything in paraller, if each instance is getting 8Gb memory and 8 cpu's.

There is one option to limit the parallilism called "Limit the number of concurrent tasks", but when I try to enter any value greater than 0,it throws error- "Must be no higher than 0 for selected CPU and memory on the region"

I tried to change region ,but its not letting me set parallalism.

Can anyone explain me,

  1. why its not letting me set parallelism manually?
  2. why it's running single single task, 10 times and processing the files, when "Run as many tasks concurrently as possible" is checked? 3.I notices that, when I was giving 32 GB of Memory to each instances, cloud run complains me, it should have at least 4 cpu's for 32 GB of memory, what is this thing.

Thanks in advance.

  1. I tried to set task parallelism manually, but its not letting me set that value greater than 0.
  2. I tried running in different region, but not able to set task parallelism.

Upvotes: 0

Views: 154

Answers (0)

Related Questions