Reputation: 131
I have a dataproc cluster running on the Google Cloud Platform. I intend to passing this cluster in the dask client instead of initializing a new dask-yarn cluster
However, I am not able to use my dataproc cluster directly
#Instead of :
cluster = YarnCluster(environment='environment.tar.gz',worker_vcores=2, worker_memory="8GiB")
cluster.scale(10)
client = Client(cluster)
#Directly using my dataproc cluster:
client = Client(my-dataproc-cluster)
Upvotes: 2
Views: 1185
Reputation: 2445
DataProc creates a new Hadoop cluster, dask-yarn
is for creating dask clusters that run inside your hadoop cluster (wherever that may be). To run properly it requires properly setup python environments and configuration, just as any other tool on hadoop would (spark included).
We don't have a dataproc specific guide, but the one for AWS's equivalent EMR is here: http://yarn.dask.org/en/latest/aws-emr.html
For deploying on DataProc you'd likely create an equivalent initialization action to the EMR bootstrap action: https://github.com/dask/dask-yarn/blob/master/deployment_resources/aws-emr/bootstrap-dask
Upvotes: 3