db1234
db1234

Reputation: 807

Extract avro schema

Similar to this question How to extract schema for avro file in python Is there a way to read in an avro file in golang without knowing the schema beforehand and extract a schema?

Upvotes: 2

Views: 3041

Answers (2)

Kyle Chadha
Kyle Chadha

Reputation: 4161

Both https://github.com/hamba/avro and https://github.com/linkedin/goavro can decode Avro OCF files (which it sounds like is what you have) without an explicit schema file.

Once you've created a new reader/decoder, you can retrieve the metadata, which includes the schema at key avro.schema: https://pkg.go.dev/github.com/hamba/avro/ocf#Decoder.Metadata and https://pkg.go.dev/github.com/linkedin/goavro#OCFReader.MetaData

Upvotes: 0

draganstankovic
draganstankovic

Reputation: 5426

How about something like this (adapted code from https://github.com/hamba/avro/blob/master/ocf/ocf.go):

package main

import (
    "github.com/hamba/avro"
    "log"
    "os"
)

// HeaderSchema is the Avro schema of a container file header.
var HeaderSchema = avro.MustParse(`{
    "type": "record",
    "name": "org.apache.avro.file.Header",
    "fields": [
        {"name": "magic", "type": {"type": "fixed", "name": "Magic", "size": 4}},
        {"name": "meta", "type": {"type": "map", "values": "bytes"}},
        {"name": "sync", "type": {"type": "fixed", "name": "Sync", "size": 16}}
    ]
}`)

var magicBytes = [4]byte{'O', 'b', 'j', 1}

const (
    schemaKey = "avro.schema"
)

// Header represents an Avro container file header.
type Header struct {
    Magic [4]byte           `avro:"magic"`
    Meta  map[string][]byte `avro:"meta"`
    Sync  [16]byte          `avro:"sync"`
}

func main() {
    r, err := os.Open("path/my.avro")
    if err != nil {
        log.Fatal(err)
    }
    defer r.Close()

    reader := avro.NewReader(r, 1024)

    var h Header
    reader.ReadVal(HeaderSchema, &h)
    if reader.Error != nil {
        log.Println("decoder: unexpected error: %v", reader.Error)
    }

    if h.Magic != magicBytes {
        log.Println("decoder: invalid avro file")
    }
    schema, err := avro.Parse(string(h.Meta[schemaKey]))
    if err != nil {
        log.Println(err)
    }
    log.Println(schema)
}

Upvotes: 4

Related Questions