Sagar Sitap
Sagar Sitap

Reputation: 1

Java Spark Bigtable connector to write dataset to Bigtable table

Error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/TableDescriptor
    at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:78)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
    at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
    at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
    at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
    at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
    at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
    at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:859)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:388)
    at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:361)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:248)
    at com.test.MassmartSparkJob.writeToBigtable(MassmartSparkJob.java:293)
    at com.test..MassmartSparkJob.runSparkJob(MassmartSparkJob.java:261)
    at com.test..MassmartSparkJob.main(MassmartSparkJob.java:193)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1029)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.TableDescriptor
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:527)
    ... 45 more

code :

datasetRecords
        .write()
        .format("org.apache.hadoop.hbase.spark")
        .option(HBaseTableCatalog.tableCatalog(), createCatalogJSON())
        .save();

Pom :

<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>3.5.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.13</artifactId>
      <version>3.5.1</version>
    </dependency>
    <!-- Bigtable HBase dependencies -->
    <!-- https://mvnrepository.com/artifact/com.google.cloud.bigtable/bigtable-hbase-2.x -->
    <dependency>
      <groupId>com.google.cloud.bigtable</groupId>
      <artifactId>bigtable-hbase-2.x</artifactId>
      <version>2.13.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.google.cloud.bigtable/bigtable-hbase-2.x-hadoop -->
    <dependency>
      <groupId>com.google.cloud.bigtable</groupId>
      <artifactId>bigtable-hbase-2.x-hadoop</artifactId>
      <version>2.13.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hbase</groupId>
      <artifactId>hbase-client</artifactId>
      <version>2.5.8</version>
      <scope>compile</scope>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-common -->
    <dependency>
      <groupId>org.apache.hbase.connectors.spark</groupId>
      <artifactId>hbase-spark</artifactId>
      <version>1.0.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>3.3.0</version>
    </dependency>

I tried creating object of org.apache.hadoop.hbase.client.TableDescriptor explicitly to check if TableDescriptor is there on class path or not. I see o=there is no problem loading TableDescriptor class.

But some how in spark bigtable connector, it shows Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.TableDescriptor

Upvotes: 0

Views: 74

Answers (1)

Karegar
Karegar

Reputation: 21

I'm not sure if this is the root cause since I can't see the full pom.xml file, but this might be happening if you are not creating an Uber JAR and including all of the relevant dependencies before submitting the job to Spark.

In general, using the hbase-spark connector with bigtable-hbase client can introduce a lot of complications with dependency management. I recommend using the native Bigtable Spark connector, which tries to address these issues and has recently been launched. You can check the official user guide in https://cloud.google.com/bigtable/docs/use-bigtable-spark-connector for more details on the connector's features and examples on how to use it.

(Disclaimer: I'm a developer in the Google Bigtable team.)

Upvotes: 2

Related Questions