Reputation: 1418
I use pyspark to read hbase table as a dataframe, but it went some wrong:
sc = SparkContext(master="local[*]", appName="test")
spark = SparkSession(sc).builder.getOrCreate()
df = spark.read.format('org.apache.hadoop.hbase.spark') \
.option('hbase.table', 'h_table') \
.option('hbase.columns.mapping',
'life_id STRING :key, score STRING info:total_score') \
.option('hbase.use.hbase.context', False) \
.option('hbase.config.resources', 'file:///home/softs/hbase-2.0.5/conf/hbase-site.xml') \
.option('hbase-push.down.column.filter', False) \
.load()
df.show()
it shows: java.lang.ClassNotFoundException: Failed to find data source: org.apache.hadoop.hbase.spark. Please find packages at http://spark.apache.org/third-party-projects.html
I followed the demo
Upvotes: 1
Views: 3162
Reputation: 1631
The dependency is not packaged with your JAR. Use the —packages flag of spark-submit to pas the uri of the connector you are using if you don’t wish to package the dependency in your project
add the following lines to your spark-submit
command:
--packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/
and it should work.
Upvotes: 1