Reputation: 13433
I have some files records in which are stored as plain text Json. A sample records:
{
"datasetID": "Orders",
"recordID": "rid1",
"recordGroupID":"asdf1",
"recordType":"asdf1",
"recordTimestamp": 100,
"recordPartitionTimestamp": 100,
"recordData":{
"customerID": "cid1",
"marketplaceID": "mid1",
"quantity": 10,
"buyingDate": "1481353448",
"orderID" : "oid1"
}
}
For each record, recordData
may be null
. If recordData
is present, orderID
may be null
.
I write the following Avro schema to represent the structure:
[{
"namespace":"model",
"name":"OrderRecordData",
"type":"record",
"fields":[
{"name":"marketplaceID","type":"string"},
{"name":"customerID","type":"string"},
{"name":"quantity","type":"long"},
{"name":"buyingDate","type":"string"},
{"name":"orderID","type":["null", "string"]}
]
},
{
"namespace":"model",
"name":"Order",
"type":"record",
"fields":[
{"name":"datasetID","type":"string"},
{"name":"recordID","type":"string"},
{"name":"recordGroupID","type":"string"},
{"name":"recordType","type":"string"},
{"name":"recordTimestamp","type":"long"},
{"name":"recordPartitionTimestamp","type":"long"},
{"name":"recordData","type": ["null", "model.OrderRecordData"]}
]
}]
Ans finally, I use the following method to de-serialize each String record into my Avro class:
Order jsonDecodeToAvro(String inputString) {
return new SpecificDatumReader<Order>(Order.class)
.read(null, DecoderFactory.get().jsonDecoder(Order.SCHEMA$, inputString));
}
But I keep getting the exception when trying to reach the above record:
org.apache.avro.AvroTypeException: Unknown union branch customerID
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:445)
What am I doing wrong? I am using JDK8 and Avro 1.7.7
Upvotes: 0
Views: 2814
Reputation: 498
The json input must be in the form
{
"datasetID": "Orders",
"recordID": "rid1",
"recordGroupID":"asdf1",
"recordType":"asdf1",
"recordTimestamp": 100,
"recordPartitionTimestamp": 100,
"recordData":{
"model.OrderRecordData" :{
"orderID" : null,
"customerID": "cid1",
"marketplaceID": "mid1",
"quantity": 10,
"buyingDate": "1481353448"
}
}
}
This is because of the way Avro's JSON encoding handles unions and nulls.
Take a look at this:
There is also an open issue regarding this: https://issues.apache.org/jira/browse/AVRO-1582
Upvotes: 1