Reputation: 346
Trying to read an avro file.
val df = spark.read.avro(file)
Running into Avro schema cannot be converted to a Spark SQL StructType: [ "null", "string" ]
Tried to manually create a schema, but now running into the following:
val s = StructType(List(StructField("value", StringType, nullable = true)))
val df = spark.read
.option("inferSchema", "false")
.schema(s)
.avro(file)
com.databricks.spark.avro.SchemaConverters$IncompatibleSchemaException: Cannot convert Avro schema to catalyst type because schema at path is not compatible (avroType = StructType(StructField(value,StringType,true)), sqlType = STRING). Source Avro schema: ["null","string"]. Target Catalyst type: StructType(StructField(value,StringType,true))
Trying to override the avro schema (without the null) also does not work:
val df = spark.read
.option("inferSchema", "false")
.option("avroSchema", """["string"]""")
.avro(file)
Avro schema cannot be converted to a Spark SQL StructType: [ "string" ]
Looks like spark-avro only creates a GenericDatumReader[GenericRecord] and I need a GenericDatumReader[Utf8] :(
Upvotes: 2
Views: 10479
Reputation: 88
Please make sure you are providing the correct AVSC with the data type. ["null", "String"] is placed to take care of null values in the Avro data. You can create the schema of your Avro file by:-
val schema = new Schema.Parser().parse(new File("user.avsc")
Or if you have Java Schema file then you can get the schema by doing:-
val schema = Schema.getClassSchema
now once you have the schema it is very simple to build a data frame with it. code snippet:-
val df =sparkSession.read.format("com.databricks.spark.avro")
.option("avroSchema", schema.toString)
.load("/home/garvit.vijay/000009_0.avro")
df.printSchema()
df.show()
Hope it works for you.
Upvotes: 2