noypikobe24
noypikobe24

Reputation: 105

spark/pyspark integration with HBase

Is it possible to connect Spark 2.4.3 connect to a remote HBase 1.3.2 server?

I've tried using this version:

https://repo.hortonworks.com/content/repositories/releases/com/hortonworks/shc-core/1.1.1-2.1-s_2.11/

but there seems to be a compatibility issue:

java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;

spark-submit --packages com.hortonworks:shc-core:1.1.1-2.1-s_2.11 --repositories http://repo.hortonworks.com/content/groups/public/ /hbase-read.py

read.py is just a simple read for testing:

from pyspark.sql import SQLContext, SparkSession

spark = SparkSession \
        .builder \
        .appName("test") \
        .enableHiveSupport() \
        .getOrCreate() 

sc = spark.sparkContext
sqlc = SQLContext(sc)
data_source_format='org.apache.spark.sql.execution.datasources.hbase'


catalog = ''.join("""{
    "table":{"namespace":"default", "name":"testtable"},
    "rowkey":"key",
    "columns":{
        "col0":{"cf":"rowkey", "col":"key", "type":"string"},
        "col1":{"cf":"cf", "col":"col1", "type":"string"}
    }
}""".split())

df = sqlc.read.options(catalog=catalog).format(data_source_format).load()
df.show()

I know this shc-core version works with Spark 2.3.3 but what are my alternative options for 2.4+ ?

I've built from shc-core from source but when I reference the jar, I receive this error:

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.TableDescriptor

even though I've referenced all the necessary jars:

spark-submit --jars /shc/core/target/shc-core-1.1.3-2.4-s_2.11.jar,/hbase-jars/hbase-client-1.3.2.jar /hbase-read.py

Upvotes: 0

Views: 1069

Answers (1)

Teja Parimi
Teja Parimi

Reputation: 111

1)Is it possible to connect Spark 2.4.3 connect to a remote HBase 1.3.2 server?

Yes it is possible. you can connect either using Hbase client or using shc-core as well.

2) java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/JsonInput;Z)Lorg/json4s/JsonAST$JValue;

This means there is one more json4s jar with different version. Check the full stack trace. from which class it's being called. remove the additional jar.

3) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.client.TableDescriptor

This jar shc-core-1.1.3-2.4-s_2.11.jar Uses hbase version >=2.0 in which TableDescriptor Class has been introduced. In Hbase 1.3.2 version there is no such class instead it has HTableDescriptor. If you wish to work with latest shc-core version, you have you use hbase version >=2.0, If you your hbase version < 2.0 then use compatible shc-core version ( <= v1.1.2-2.2)

4)I know this shc-core version works with Spark 2.3.3 but what are my alternative options for 2.4+ ?

shc-core is pretty much straight forward one. it works with any 2.4 as well. it will provide sql plan for spark on how to convert different type columns to bytes (back and forth ). Make sure you are picking right jar for your hbase one.

Upvotes: 2

Related Questions