Reputation: 85
When you start a hive session in Dataproc you can add jars that live in a gcs bucket.
add jar gs://my-bucket/serde.jar;
I don't want to have to add all the jars I need each time I start a hive session so I tried adding the jar paths to hive-site.xml in the hive.aux.jars.path property.
<property>
<name>hive.aux.jars.path</name>
<value>gs://my-bucket/serde.jar</value>
</property>
Then I get hit with this error when trying to start a hive session.
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://gs, expected: file:///
Is there a way to add custom jars that live in a gcs bucket to the hive classpath or would I have to copy the jars from my bucket and update hive.aux.jars.path each time I create a dataproc cluster?
*edit
Even after adding the below property and restarting hive I still get the same error.
<property>
<name>hive.exim.uri.scheme.whitelist</name>
<value>hdfs,pfile,gs</value>
<final>false</final>
</property>
Upvotes: 1
Views: 921
Reputation: 4457
This is a known Hive bug (HIVE-18871) - hive.aux.jars.path
supports only local paths in Hive 3.1 and lower.
Workaround will be to use Dataproc initialization action that copies jars from GCS to the same local FS path on all Dataproc cluster nodes and specify this local path as a value of the hive.aux.jars.path
property.
HIVE-18871 fix was back ported to Dataproc 1.3+ images, so you can use GCS URIs in the hive.aux.jars.path
property with new Dataproc images that have this fix.
Upvotes: 1
Reputation: 74
I guess you also need to set property hive.exim.uri.scheme.whitelist
to whitelist gcs uri.
So in your case, while creating a Dataproc cluster, set properties
hive.aux.jars.path = gs://my-bucket/serde.jar
hive.exim.uri.scheme.whitelist = hdfs,pfile,gs
Upvotes: 0