Reputation: 539
I am storing dataframe to hbase table from the pyspark dataframe in CDP7, following this example, the components that I use are:
The command that i use:
spark3-submit --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ --files /etc/hbase/conf/hbase-site.xml test-hbase3.py
However, I got this error, it is quite long that I need to put it in hastebin.com as below: spark-log
error snippet:
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/test-hbase3.py", line 45, in <module>
main()
File "/opt/cloudera/parcels/CDH-7.1.6-1.cdh7.1.6.p0.10506313/test-hbase3.py", line 24, in main
writeDF.write.options(catalog=writeCatalog, newtable=5).format(dataSourceFormat).save()
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1107, in save
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/opt/cloudera/parcels/SPARK3-3.1.1.3.1.7270.0-253-1.p0.11638568/lib/spark3/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o36.save.
: java.lang.NoClassDefFoundError: scala/Product$class
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.<init>(HBaseRelation.scala:73)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:59)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
What should I do to fix the error? I tried to find other connector. However, only found SHC connector. Im not using any Maven repo here. But, not sure if there is missing dependencies or other error.
Upvotes: 0
Views: 460
Reputation: 6907
This is a scala version conflict. Your shc_core.jar is compiled for scala 2.11 but you're using scala 2.12 which is not binary compatible with 2.11.
Easiest fix would be to recompile shc_core from source for scala 2.12 (although you may end up with compatibility issues since the project is obviously not tested with scala 2.12)
Other ways you can explore to solve your issue:
accessing HBase through a Hive table (https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/hbase-data-access/content/hdag_hbase_hive_integration_example.html)
using something like Phoenix connector (https://docs.cloudera.com/runtime/7.2.8/phoenix-access-data/topics/phoenix-spark-connector-examples.html)
rolling your own connector with HBase API for your data frame, it's fairly straightforward if you're not trying to implement a fully generic solution
Upvotes: 1