dses
dses

Reputation: 85

How to add hive auxiliary jars to Dataproc cluster

When you start a hive session in Dataproc you can add jars that live in a gcs bucket.
add jar gs://my-bucket/serde.jar;

I don't want to have to add all the jars I need each time I start a hive session so I tried adding the jar paths to hive-site.xml in the hive.aux.jars.path property.

<property>
  <name>hive.aux.jars.path</name>
  <value>gs://my-bucket/serde.jar</value>
</property>

Then I get hit with this error when trying to start a hive session.
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://gs, expected: file:///

Is there a way to add custom jars that live in a gcs bucket to the hive classpath or would I have to copy the jars from my bucket and update hive.aux.jars.path each time I create a dataproc cluster?

*edit
Even after adding the below property and restarting hive I still get the same error.

  <property>
    <name>hive.exim.uri.scheme.whitelist</name>
    <value>hdfs,pfile,gs</value>
    <final>false</final>
  </property>

Upvotes: 1

Views: 921

Answers (2)

Igor Dvorzhak
Igor Dvorzhak

Reputation: 4457

This is a known Hive bug (HIVE-18871) - hive.aux.jars.path supports only local paths in Hive 3.1 and lower.

Workaround will be to use Dataproc initialization action that copies jars from GCS to the same local FS path on all Dataproc cluster nodes and specify this local path as a value of the hive.aux.jars.path property.

Update

HIVE-18871 fix was back ported to Dataproc 1.3+ images, so you can use GCS URIs in the hive.aux.jars.path property with new Dataproc images that have this fix.

Upvotes: 1

Animesh
Animesh

Reputation: 74

I guess you also need to set property hive.exim.uri.scheme.whitelist to whitelist gcs uri.

So in your case, while creating a Dataproc cluster, set properties

hive.aux.jars.path = gs://my-bucket/serde.jar
hive.exim.uri.scheme.whitelist = hdfs,pfile,gs

Upvotes: 0

Related Questions