Reputation: 53
So, I'm using gcloud dataproc
, Hive
and Spark
on my project but I can't connect to Hive metastore
apparently.
I have the tables populated correctly and all the data is there, for example the table that I'm trying to access now is the next on the image and as you can see the parquet file is there (stores as parquet). Sparktp2-m
is the master of the dataproc cluster
.
Next, I have a project on IntelliJ that will have some queries on it but first I need to access this hive data and it's not going well. I'm trying to access it like this:
SparkSession spark = SparkSession
.builder()
.appName("Check")
.config("hive.metastore.uris","thrift://hive-metastore:9083")
.enableHiveSupport()
.getOrCreate();
JavaPairRDD<Tuple2<Object, String>, Integer> mr = spark.table("title_basics_parquet").toJavaRDD()...
And next, I build the jar and send it as a job like this:
gcloud dataproc jobs submit spark --jars target/GGCD_Spark-1.0-SNAPSHOT.jar --class parte1.Queries --cluster sparktp2 --region europe-west1
And the error is:
Am I missing something, or is it the wrong URI?
Upvotes: 2
Views: 1799
Reputation: 26458
The default Hive Metastore thrift://<master-node-hostname>:9083
.
Upvotes: 1