AJ007
AJ007

Reputation: 125

How to use avro data with old and new namespace

I am facing a problem where I have updated the namespace in my avsc schema file. Since we were using common processor created in Java to parse the XML to avro and were using the avsc file.

We have separated the interfaces and created 2 different namespaces and now having 2 avsc schemas which are identical just the namespace is different.

Since we have data which was generated using old namespace, I am unable to query this data with new data generated with new namespace.

Here is example of my schemas -

Old schema - "type" : "record", "name" : "Message", "namespace" : "com.myfirstavsc", "fields" : [ { "name" : "Header",.....**other fields**

New schema - "type" : "record", "name" : "Message", "namespace" : "com.mysecondavsc", "fields" : [ { "name" : "Header",.....**other fields**

When I query my hive table I get below exception

Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found com.myfirstavsc.Property, expecting union

Upvotes: 3

Views: 3813

Answers (2)

Teng
Teng

Reputation: 368

http://apache-avro.679487.n3.nabble.com/Deserialize-with-different-schema-td4032782.html

The link mentioned above is not work anymore, so add an explanation here.

We got the same error in a project named Hudi, so raised an issue about it: https://github.com/apache/hudi/issues/7284

After trouble shooting, the root cause of this exception org.apache.avro.AvroTypeException: Found hoodie.test_mor_tab.test_mor_tab_record.new_test_col.fixed, expecting union is Avro schema generator rule, it can't accept the change of namespace when handling UNION type.

According to Avro Schema Resolution doc, it can accept schema evolution if either schema is a union in reader or writer schema in GenericDatumReader(Schema writer, Schema reader). But it didn't mention there is another restriction about it: the full name of schema must be the same if the type is RECORD or ENUM or FIXED.

Code reference:

ResolvingGrammarGenerator#bestBranch

public class ResolvingGrammarGenerator extends ValidatingGrammarGenerator {
...
  private int bestBranch(Schema r, Schema w, Map<LitS, Symbol> seen) throws IOException {
    Schema.Type vt = w.getType();
      // first scan for exact match
      int j = 0;
      int structureMatch = -1;
      for (Schema b : r.getTypes()) {
        if (vt == b.getType())
          if (vt == Schema.Type.RECORD || vt == Schema.Type.ENUM ||
              vt == Schema.Type.FIXED) {
            String vname = w.getFullName();
            String bname = b.getFullName();
            // return immediately if the name matches exactly according to spec
            if (vname != null && vname.equals(bname))
              return j;

            if (vt == Schema.Type.RECORD &&
                !hasMatchError(resolveRecords(w, b, seen))) {
              String vShortName = w.getName();
              String bShortName = b.getName();
              // use the first structure match or one where the name matches
              if ((structureMatch < 0) ||
                  (vShortName != null && vShortName.equals(bShortName))) {
                structureMatch = j;
              }
            }
          } else
            return j;
        j++;
      }

      // if there is a record structure match, return it
      if (structureMatch >= 0)
        return structureMatch;

      // then scan match via numeric promotion
      j = 0;
      for (Schema b : r.getTypes()) {
        switch (vt) {
        case INT:
          switch (b.getType()) {
          case LONG: case DOUBLE:
            return j;
          }
          break;
        case LONG:
        case FLOAT:
          switch (b.getType()) {
          case DOUBLE:
            return j;
          }
          break;
        case STRING:
          switch (b.getType()) {
          case BYTES:
            return j;
          }
          break;
        case BYTES:
          switch (b.getType()) {
          case STRING:
            return j;
          }
          break;
        }
        j++;
      }
      return -1;
  }
...
}

Upvotes: 0

hlagos
hlagos

Reputation: 7947

I am not sure how you are trying to read your data but use GenericDatumReader should solve your issue, after that you can convert the generic record to your specific records. I found something similar here

http://apache-avro.679487.n3.nabble.com/Deserialize-with-different-schema-td4032782.html

Upvotes: 1

Related Questions