oceansize
oceansize

Reputation: 719

How To Run Apache Hudi Hive Sync Tool

This documentation https://hudi.apache.org/docs/syncing_metastore is not really straightforward.

I've spent a lot of time trying to make this tool working. Whether I run it from CLI (run-sync-tools.sh) or from Intellij (Running HiveSyncTool directly) - I always receive ClassNotFoundException for different classes..

First exception is ClassNotFoundException: org.slf4j.LoggerFactory.. Ok I added dependency explicitly. But in continues..

In Intellij it's happening because almost all dependencies are with provided scope. I had to change to compile..

After resolving those exceptions I receive:

java.lang.NoSuchMethodError: 'org.apache.parquet.schema.LogicalTypeAnnotation org.apache.parquet.schema.Type.getLogicalTypeAnnotation()'

This looks like parquet and avro libraries incompatibility. Tried different versions but without success.

The main question here - is there any easy way to run this tool? I don't believe it should be required ato add missing dependencies/changing Maven scope.. This is really weird.

Thanks in advance

Upvotes: 1

Views: 416

Answers (1)

Albert T. Wong
Albert T. Wong

Reputation: 1653

See my answer at https://github.com/apache/incubator-xtable/discussions/457#discussioncomment-9659748.

The gist.... download the following from mavenrepository.com

org.apache.hudi:hudi-hive-sync-bundle:0.14.1,com.amazonaws:aws-java-sdk-s3:1.11.271,org.apache.hadoop:hadoop-client:2.10.2,org.apache.hadoop:hadoop-aws:2.10.2

and you'll need a Hive 2.3.10 and Hadoop 2.10.2 installation.

Upvotes: 1

Related Questions