user3858193
user3858193

Reputation: 1518

How to build Google Cloud dataproc edge node?

We are moving from On-Premises environment to google cloud dataproc for spark jobs. I am able to build the cluster though and ssh to master node for job execution. I am not clear how to build the edge node where we can allow users to login and submit job. Is it going to be another gce vm? Any thoughts or best practices?

Upvotes: 2

Views: 840

Answers (1)

rsantiago
rsantiago

Reputation: 2099

A new VM instance is a good option to map the EdgeNode role from other architectures:

  • You can execute your job from the Master node which you can make accessible through SSH.

  • You will need to find a balance between simplicity (SHH) or security (EdgeNode).

  • Please note that IAM can help to allow individual users to submit jobs by assigning Dataproc Editor role.

Don't forget the ability that Dataproc offers of creating ephemeral nodes. This means that you create a cluster, execute your job and delete your cluster.

Using ephemeral clusters will avoid unnecessary costs. Even, the script you create for that it can be executed from any machine that has the Google Cloud SDK installed, e.g. OnPrem servers or your PC.

Upvotes: 2

Related Questions