Reputation: 526
I try to create a Dataproc cluster which has a time to live of 1 day using python SDK. For this purpose, v1beta2 of the Dataproc API introduces the LifecycleConfig object which is child of the ClusterConfig object.
I use this object in the JSON file which I pass to the create_cluster
method. To set the particular TTL, I use the field auto_delete_ttl
which shall have the value 86,400 seconds (one day).
The documentation of Google Protocol Buffers is rather specific about how to represent a duration in the JSON file: Durations shall be represented as string with suffix s for seconds and there shall be 0,3,6 or 9 fractional seconds:
However, if I pass the duration using this format, I get the error:
Parameter to MergeFrom() must be instance of same class: expected google.protobuf.Duration got str
This is how I create the cluster:
from google.cloud import dataproc_v1beta2
project = "your_project_id"
region = "europe-west4"
cluster = "" #see below for cluster JSON file
client = dataproc_v1beta2.ClusterControllerClient(client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region)
})
# Create the cluster
operation = client.create_cluster(project, region, cluster)
The variable cluster holds the JSON object describing the desired cluster:
{
"cluster_name":"my_cluster",
"config":{
"config_bucket":"my_conf_bucket",
"gce_cluster_config":{
"zone_uri":"europe-west4-a",
"metadata":{
"PIP_PACKAGES":"google-cloud-storage google-cloud-bigquery"
},
"subnetwork_uri":"my subnet",
"service_account_scopes":[
"https://www.googleapis.com/auth/cloud-platform"
],
"tags":[
"some tags"
]
},
"master_config":{
"num_instances":1,
"machine_type_uri":"n1-highmem-4",
"disk_config":{
"boot_disk_type":"pd-standard",
"boot_disk_size_gb":200,
"num_local_ssds":0
},
"accelerators":[
]
},
"software_config":{
"image_version":"1.4-debian9",
"properties":{
"dataproc:dataproc.allow.zero.workers":"true",
"yarn:yarn.log-aggregation-enable":"true",
"dataproc:dataproc.logging.stackdriver.job.driver.enable":"true",
"dataproc:dataproc.logging.stackdriver.enable":"true",
"dataproc:jobs.file-backed-output.enable":"true"
},
"optional_components":[
]
},
"lifecycle_config":{
"auto_delete_ttl":"86400s"
},
"initialization_actions":[
{
"executable_file":"gs://some-init-script"
}
]
},
"project_id":"project_id"
}
Package versions I am using:
Am I doing something wrong here, is it an issue with wrong package versions or is it even a bug?
Upvotes: 4
Views: 1283
Reputation: 4457
You should use 100s
format for a duration type when you construct protobuf in a text format (i.e. json, etc), but you are using a Python object to construct API request body, that's why you need to create a Duration object instead of a string:
duration_message.FromSeconds(86400)
Upvotes: 4