Reputation: 125
I am facing a problem where I have updated the namespace in my avsc schema file. Since we were using common processor created in Java to parse the XML to avro and were using the avsc file.
We have separated the interfaces and created 2 different namespaces and now having 2 avsc schemas which are identical just the namespace is different.
Since we have data which was generated using old namespace, I am unable to query this data with new data generated with new namespace.
Here is example of my schemas -
Old schema - "type" : "record",
"name" : "Message",
"namespace" : "com.myfirstavsc",
"fields" : [ {
"name" : "Header",.....**other fields**
New schema - "type" : "record",
"name" : "Message",
"namespace" : "com.mysecondavsc",
"fields" : [ {
"name" : "Header",.....**other fields**
When I query my hive table I get below exception
Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found com.myfirstavsc.Property, expecting union
Upvotes: 3
Views: 3813
Reputation: 368
http://apache-avro.679487.n3.nabble.com/Deserialize-with-different-schema-td4032782.html
The link mentioned above is not work anymore, so add an explanation here.
We got the same error in a project named Hudi, so raised an issue about it: https://github.com/apache/hudi/issues/7284
After trouble shooting, the root cause of this exception org.apache.avro.AvroTypeException: Found hoodie.test_mor_tab.test_mor_tab_record.new_test_col.fixed, expecting union
is Avro schema generator rule, it can't accept the change of namespace when handling UNION
type.
According to Avro Schema Resolution doc, it can accept schema evolution if either schema is a union in reader or writer schema in GenericDatumReader(Schema writer, Schema reader)
. But it didn't mention there is another restriction about it: the full name of schema must be the same if the type is RECORD
or ENUM
or FIXED
.
Code reference:
ResolvingGrammarGenerator#bestBranch
public class ResolvingGrammarGenerator extends ValidatingGrammarGenerator {
...
private int bestBranch(Schema r, Schema w, Map<LitS, Symbol> seen) throws IOException {
Schema.Type vt = w.getType();
// first scan for exact match
int j = 0;
int structureMatch = -1;
for (Schema b : r.getTypes()) {
if (vt == b.getType())
if (vt == Schema.Type.RECORD || vt == Schema.Type.ENUM ||
vt == Schema.Type.FIXED) {
String vname = w.getFullName();
String bname = b.getFullName();
// return immediately if the name matches exactly according to spec
if (vname != null && vname.equals(bname))
return j;
if (vt == Schema.Type.RECORD &&
!hasMatchError(resolveRecords(w, b, seen))) {
String vShortName = w.getName();
String bShortName = b.getName();
// use the first structure match or one where the name matches
if ((structureMatch < 0) ||
(vShortName != null && vShortName.equals(bShortName))) {
structureMatch = j;
}
}
} else
return j;
j++;
}
// if there is a record structure match, return it
if (structureMatch >= 0)
return structureMatch;
// then scan match via numeric promotion
j = 0;
for (Schema b : r.getTypes()) {
switch (vt) {
case INT:
switch (b.getType()) {
case LONG: case DOUBLE:
return j;
}
break;
case LONG:
case FLOAT:
switch (b.getType()) {
case DOUBLE:
return j;
}
break;
case STRING:
switch (b.getType()) {
case BYTES:
return j;
}
break;
case BYTES:
switch (b.getType()) {
case STRING:
return j;
}
break;
}
j++;
}
return -1;
}
...
}
Upvotes: 0
Reputation: 7947
I am not sure how you are trying to read your data but use GenericDatumReader should solve your issue, after that you can convert the generic record to your specific records. I found something similar here
http://apache-avro.679487.n3.nabble.com/Deserialize-with-different-schema-td4032782.html
Upvotes: 1