ohaionm
ohaionm

Reputation: 31

DataFrameReader throwing "Unsupported type NULL" while reading avro file

I am trying to read an avro file with DataFrame, but keep getting:

org.apache.spark.sql.avro.IncompatibleSchemaException: Unsupported type NULL

Since I am going to deploy it on Dataproc I am using Spark 2.4.0, but the same happened when I tried other versions.

Following is my dependencies:

 <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>${spark.version}</version>
            <scope>provided</scope>
        </dependency>
 </dependencies>

My main class:

public static void main(String[] args) {

        SparkConf sparkConf = new SparkConf()
                .setAppName("Example");

        SparkSession spark = SparkSession
                .builder()
                .appName("Java Spark SQL basic example")
                .getOrCreate();

        Dataset<Row> rowDataset = spark.read().format("avro").load("avro_file");

   }

Running command:

spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 --master local[*] --class MainClass my-spak-app.jar

After running a lot of tests I concluded that it happens because I have in my avro schema a field defined with "type": "null". I am not creating the files I am working on so I can't change the schema. I am able to read the files when I am using RDD and read the file with newAPIHadoopFile method.

Is there a way to read avro files with "type": "null" using Dataframe or I will have to work with RDD?

Upvotes: 2

Views: 1424

Answers (1)

Saswat
Saswat

Reputation: 71

You can specify a schema when you read the file. Create a schema for your file

val ACCOUNT_schema = StructType(List(
    StructField("XXX",DateType,true),
    StructField("YYY",StringType,true))


val rowDataset = spark.read().format("avro").option("avroSchema", schema).load("avro_file");

I am not very familiar with java syntax, but I think you can manage it.

Upvotes: 2

Related Questions