DataFrameReader throwing "Unsupported type NULL" while reading avro file

Question

I am trying to read an avro file with DataFrame, but keep getting:

org.apache.spark.sql.avro.IncompatibleSchemaException: Unsupported type NULL

Since I am going to deploy it on Dataproc I am using Spark 2.4.0, but the same happened when I tried other versions.

Following is my dependencies:

 
        
            org.apache.spark
            spark-sql_2.11
            ${spark.version}
            provided

My main class:

public static void main(String[] args) {

        SparkConf sparkConf = new SparkConf()
                .setAppName("Example");

        SparkSession spark = SparkSession
                .builder()
                .appName("Java Spark SQL basic example")
                .getOrCreate();

        Dataset rowDataset = spark.read().format("avro").load("avro_file");

   }

Running command:

spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 --master local[*] --class MainClass my-spak-app.jar

After running a lot of tests I concluded that it happens because I have in my avro schema a field defined with "type": "null". I am not creating the files I am working on so I can't change the schema. I am able to read the files when I am using RDD and read the file with newAPIHadoopFile method.

Is there a way to read avro files with "type": "null" using Dataframe or I will have to work with RDD?

DataFrameReader throwing "Unsupported type NULL" while reading avro file

Answers (1)

Related Questions

DataFrameReader throwing &quot;Unsupported type NULL&quot; while reading avro file

Answers (1)

Related Questions

DataFrameReader throwing "Unsupported type NULL" while reading avro file