Tal
Tal

Reputation: 89

How to extract schema id from avro message in Spark Scala

I have spark Scala dataframe with column contains the value of avro message(Array[Byte]). I know that the 0 byte is the magic byte and the bytes in positions 1-4 included is the schema id. how can i extract those bytes (1-4) and add new column with the schema id value in int?

need to use some spark functions/udf in spark Scala to extrace the schema id value

Upvotes: 0

Views: 369

Answers (1)

You could do something like:

import org.apache.spark.sql.functions.udf

// extract the schema id (bytes 1-4) and convert it to integer
val extractSchemaId = udf((message: Array[Byte]) => {
  ByteBuffer.wrap(message.slice(1, 5)).getInt()
})

// assuming the "value" column holds the Avro message
df.withColumn("schema_id", extractSchemaId(df("value")))

Upvotes: 1

Related Questions