user3858193
user3858193

Reputation: 1518

Google Cloud dataproc not able to access hive metastore from cloudsql with --scopes=cloud-platform

I have created 2 data proc cluster.The requirement is to use 1 hive meta store and both the cluster can access. First one is ETL cluster which has --scopes=sql-admin and 2nd one for ML users --scopes=cloud-platform .The Data base and tables created using ETL clusters are not accessed by ML cluster. Can any one help if I have to add --scopes=sql-admin in each cluster.

ETL Cluster Create Command:

 gcloud dataproc clusters create amlgcbuatbi-report \
>     --project=${PROJECT} \
>     --master-machine-type n1-standard-1 --worker-machine-type n1-standard-1 --master-boot-disk-size 50 --worker-boot-disk-size 50 \
>     --zone=${ZONE} \
>     --num-workers=${WORKERS} \
>     --scopes=sql-admin \
>     --image-version=1.3 \
>     --initialization-actions=gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh \
>     --properties=hive:hive.metastore.warehouse.dir=gs://gftat/data \
>     --metadata="hive-metastore-instance=$PROJECT:$REGION:metaore-dev001"

Output:

0: jdbc:hive2://localhost:10000/default> show databases;
+------------------+
|  database_name   |
+------------------+
| default          |
| gcb_dw           |
| l1_gcb_trxn_raw  |
+------------------+

ML Cluster create command:

gcloud dataproc clusters create amlgcbuatbi-ml \
    >     --project=${PROJECT} \
    >     --master-machine-type n1-standard-1 --worker-machine-type n1-standard-1 --master-boot-disk-size 50 --worker-boot-disk-size 50 \
    >     --zone=${ZONE} \
    >     --num-workers=${WORKERS} \
    >     --scopes=cloud-platform \
    >     --image-version=1.3 \
    >     --optional-components=PRESTO \
    >     --initialization-actions=gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh \
    >     --initialization-actions=gs://dataproc-initialization-actions/presto/presto.sh \
    >     --metadata="hive-metastore-instance=$PROJECT:$REGION:metaore-dev001"

Output: Here I am not able to see the DB and tables.

0: jdbc:hive2://localhost:10000/default> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+

Upvotes: 1

Views: 1271

Answers (1)

Dennis Huo
Dennis Huo

Reputation: 10677

The --initialization-actions flag requires a comma-separated list rather than repeating the flag to append multiple initialization actions to the list. Try

--initialization-actions=gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh,gs://dataproc-initialization-actions/presto/presto.sh

Instead of two separate --initialization-actions flags.

Upvotes: 2

Related Questions