hadooper
hadooper

Reputation: 93

spark-avro databricks package

I am trying to include the spark-avro package while starting spark-shell, as per the instructions mentioned here: https://github.com/databricks/spark-avro#with-spark-shell-or-spark-submit.

spark-shell --packages com.databricks:spark-avro_2.10:2.0.1

My intent is to convert the avro schema to spark schema type, using SchemaConverter class present in the package.

import com.databricks.spark.avro._ ... //colListDel is list of fields from avsc which are to be delted for some functional reason.

for( field <- colListDel){
 println(SchemaConverters.toSqlType(field.schema()).dataType);
}

...

On execution of above for loop, i get below error:

<console>:47: error: object SchemaConverters in package avro cannot be accessed in package com.databricks.spark.avro
            println(SchemaConverters.toSqlType(field.schema()).dataType);

Please suggest if there is anything I am missing or let me know how to include SchemaConverter in my scala code.

Below are my envt details: Spark version: 1.6.0 Cloudera VM 5.7

Thanks!

Upvotes: 4

Views: 2634

Answers (1)

Piotr Reszke
Piotr Reszke

Reputation: 1596

This object and the mentioned method used to be private. Please check the source code from version 1.0:

https://github.com/databricks/spark-avro/blob/branch-1.0/src/main/scala/com/databricks/spark/avro/SchemaConverters.scala

private object SchemaConverters {
  case class SchemaType(dataType: DataType, nullable: Boolean)
  /**
   * This function takes an avro schema and returns a sql schema.
   */
  private[avro] def toSqlType(avroSchema: Schema): SchemaType = {
    avroSchema.getType match {
    ...

You were downloading the 2.0.1 version which was probably not build from latest 2.0 branch. I checked the 3.0 version and this class and method are public now.

This should solve your problems:

spark-shell --packages com.databricks:spark-avro_2.10:3.0.0

EDIT: added after comment

The spark-avro 3.0.0 library requires Spark 2.0, so you can replace your current Spark with 2.0 version. The other option would be to contact databricks and ask them to build 2.0.2 version - from the latest 2.0 branch.

Upvotes: 1

Related Questions