James
James

Reputation: 2331

How can I run Presto on Google Cloud Dataproc?

I want to run Presto on a Dataproc instance or on Google Cloud Platform in general. How can I easily setup and install Presto, especially with Hive?

Upvotes: 0

Views: 1441

Answers (2)

Dagang Wei
Dagang Wei

Reputation: 26538

There is an official tutorial now Use Presto with Google Cloud Dataproc. Essentially, you can

  1. Create a cluster with Presto init action:

gcloud dataproc clusters create presto-cluster \ --project=${PROJECT} \ --zone=${ZONE} \ --num-workers=${WORKERS} \ --scopes=cloud-platform \ --initialization-actions=gs://dataproc-initialization-actions/presto/presto.sh

  1. Create an SSH tunnel from your local machine to the master node:

gcloud compute ssh presto-cluster-m \ --project=${PROJECT} \ --zone=${ZONE} \ -- -D 1080 -N

  1. Connect to the Presto coordinator with Presto CLI through the SSH tunnel:

./presto-cli \ --server presto-cluster-m:8080 \ --socks-proxy localhost:1080 \ --catalog hive \ --schema default

Upvotes: 0

James
James

Reputation: 2331

You can use an initialization action with a Cloud Dataproc cluster to quickly install and configure Presto. Specifically, there is a GitHub repository with initialization actions. There is a Presto initialization action which lets you quickly install and configure Presto.

If you want to use the Presto WebUI, once the cluster is online you can follow these directions to create an SSH tunnel and SOCKS proxy to the cluster. From there, you can access Presto (by default, unless you change it) on port 8080 on the master node.

Upvotes: 1

Related Questions