Timbuck
Timbuck

Reputation: 423

Why does an Avro field that was string now require avro.java.string type?

In Avro IDL I have a Message record defined as follows:

record Message{

    MessageId id;
    array<string> dataField;
}

I am using this record in another record with a null union:

record Invoice{
    ...
    union {null,array<Message>} message;
}

We have a Java Kafka consumer (we're using Confluent Platform) that is using the avro-maven-plugin version 1.10.2, configured with <stringType>String</stringType>

When we are making a call such as this:

List<String> msgList = message.getDataField();
for (String msg : msgList) {...}

we receive the following error on the second line: class org.apache.avro.util.Utf8 cannot be cast to class java.lang.String

Previously, the Invoice object was defined as:

 record Invoice{
    ...
    array<Message> message;
}

and we did not receive this error. We have found that in our schema file, changing from

 "name" : "dataField",
      "type" : {
        "type" : "array",
        "items" : "string"
      }

to

"name" : "dataField",
 "type" : {
   "type" : "array",
     "items" :{
        "type": "string",
        "avro.java.string" : "String"
   }
 }

corrects the problem.

I'm unclear as to why adding the union caused this change in behavior. Should I declare all of the strings in the schema with the avro.java.string and if so, how do I do that with Avro IDL?

Upvotes: 1

Views: 5406

Answers (2)

Akta Kalariya
Akta Kalariya

Reputation: 111

We can avoid changes in actual schema by setting property avro.remove.java.properties=true

This will put avro.java.string in generated POJO but it will ignore same while connecting to schema registry, and will not throw error to re-register new schema. So we will not require any change in avro schema file or in POJO after this.

It is mentioned here in this document https://docs.confluent.io/platform/current/schema-registry/serdes-develop/serdes-avro.html#avro-serializer

Upvotes: 3

Timbuck
Timbuck

Reputation: 423

At this point, there appears to be a couple of ways to resolve this, at least when using the Confluent Platform, version 5.5.1 or later. And I'm considering the problem to be an open defect with Avro.

The first option is to update the Avro Schema file with a global search and replace of "type":"string" to

"type": {
       "avro.java.string": "String",
       "type": "string"
    }

This first option would need to be done after creating any files via Avro IDL since it doesn't support this construct, making IDL less useful in this case. Strangely, this approach does not appear to impact records that come in via REST Proxy that have "type":"string" associated without the additional avro.java.string information. They appear able to use a schema defined in either way; I was expecting the updated schema with the avro.java.string information to cause problems with REST Proxy requests that don't have that detail.

The second option is to set auto.register.schemas=false and use.latest.version=true, though this may cause unintended consequences with compatibility in the future.

The third option is to just not use the <stringType> directive in the Maven configuration for Avro Tools. This means a lot of coding around the CharacterSequence that is used by default, usually in the form of .toString() methods.

Upvotes: 3

Related Questions