Ada Pongaya
Ada Pongaya

Reputation: 415

Error while writing to HBase Table using PySpark

I am trying to write to hbase table using pySpark. So far, I could able to read the data from hbase. but getting exception when writing to hbase table.

from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql.types import *

properties = {
  "instanceId" : "hbase",
  "zookeepers" : "10-x-x-x.local:2181,10-x-x-x.local:2181,10-x-x-x.local:2181",
  "hbase.columns.mapping" : "KEY_FIELD STRING :key, A STRING c:a, B STRING c:b",
  "hbase.use.hbase.context" : False,
  "hbase.config.resources" : "file:///etc/hbase/conf/hbase-site.xml",
  "hbase.table"  : "t"
spark = SparkSession\

sc = spark.sparkContext

#I am able to read the data successfully.
#df ="org.apache.hadoop.hbase.spark")\
#    .options( **properties)\
#    .load()

data = [("3","DATA 3 A", "DATA 3 B")]
columns = ['KEY_FIELD','A','B']
cSchema = StructType([StructField(columnName, StringType()) for columnName in columns])
df = spark.createDataFrame(data, schema=cSchema)
      .options( **properties)\

Executing Command in the following format:

spark2-submit --master local[*]

Spark Version: 2.2.0.cloudera1 (I can't change my spark version) HBase Version: 1.2.0-cdh5.12.0 (But I can change my HBase Version)

Note: I have added the hbase jars to spark2 jar folder and I ve added to following dependent jars to the spark2 jar folder.

  1. spark-core_2.11-1.6.1.jar
  2. htrace-core-3.1.0-incubating.jar
  3. scala-library-2.9.1.jar

Error :

py4j.protocol.Py4JJavaError: An error occurred while calling
: java.lang.RuntimeException: org.apache.hadoop.hbase.spark.DefaultSource does not allow create table as select.
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:476)

I've tried multiple suggestions but nothing worked. It might be a duplicate question, but I have no other option to find a answer.

Upvotes: 0

Views: 2847

Answers (2)

Ada Pongaya
Ada Pongaya

Reputation: 415

Solved it by compiling the git repo and put the shc jar in the spark jar folder. and followed the link suggessted by @Aniket Kulkarni

the final code looks something like this,

from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql.types import *

properties = {
  "instanceId" : "hbase",
  "zookeepers" : "10-x-x-x.local:2181,10-x-x-x.local:2181,10-x-x-x.local:2181",
  "hbase.columns.mapping" : "KEY_FIELD STRING :key, A STRING c:a, B STRING c:b",
  "hbase.use.hbase.context" : False,
  "hbase.config.resources" : "file:///etc/hbase/conf/hbase-site.xml",
  "hbase.table"  : "test_table"
spark = SparkSession.builder\

sc = spark.sparkContext
catalog = ''.join("""{
    "table":{"namespace":"default", "name":"test_table"}
        "KEY_FIELD":{"cf":"rowkey", "col":"key", "type":"string"},
        "A":{"cf":"c", "col":"a", "type":"string"},
        "B":{"cf":"c", "col":"b", "type":"string"}

data = [("3","DATA 3 A", "DATA 3 B")]
columns = ['KEY_FIELD','A','B']
cSchema = StructType([StructField(columnName, StringType()) for columnName in columns])
df = spark.createDataFrame(data, schema=cSchema)
      .options( **properties)\

Upvotes: 0


Reputation: 5480

If you are using Cloudera distribution then Hard Luck there is no official way to write to HBASE using PYSAPRK. This has been confirmed by Cloudera support Team.

But if you are using Hortonworks and if you have spark 2.0 then the below link should get you started.

Pyspark to Hbase write

Upvotes: 2

Related Questions