Reputation: 383
It's evident that preemptible instance are cheaper than non-preemptible instance. On daily basis 400-500 dataflow jobs are running in my organisational project. Out of which some jobs are time-sensitive and others are not. So is there any way I could use preemptible instance for non-time-constraint job, which will cost me less for overall pipeline execution. Currently I'm running dataflow jobs with below specified configuration.
options.setTempLocation("gs://temp/");
options.setRunner(DataflowRunner.class);
options.setTemplateLocation("gs://temp-location/");
options.setWorkerMachineType("n1-standard-4");
options.setMaxNumWorkers(20);
options.setWorkerCacheMb(2000);
I'm not able to find out any pipeline options with preemptible instance setting.
Upvotes: 7
Views: 1696
Reputation: 7058
Yes, it is possible to do so with Flexible Resource Scheduling in Cloud Dataflow (docs). Note that there are some things to consider:
QUEUED
status for your Dataflow jobs). They are run opportunistically when resources are available within a six-hour window. This makes FlexRS suitable to reduce cost for non-time-critical workloads. Also, be sure to validate your code before sending the job.You cannot set autoscalingAlgorithm=NONE
n1-standard-2
(default) and n1-highmem-16
.In order to run it, use --flexRSGoal=COST_OPTIMIZED
and make sure to take into account that the rest of parameters conform to the FlexRS needs.
A uniform discount rate is applied to FlexRS jobs, you can compare pricing details in the following link.
Note that you might see a Beta disclaimer in the non-English documentation but, as clarified in the release notes, it's Generally Available.
Upvotes: 10