GustavoT
GustavoT

Reputation: 31

Caused by: org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -53

Trying to use Flink to read a Kafka stream of "avro" serialized data, like this:

        tableEnv.connect(new Kafka()
                .version("0.11")
                .topic(source.getTopic())
                .properties(source.getProperties())
                .startFromLatest())       
        .withSchema(Schemafy.getSchemaFromJson(source.getAvroSchema()))
                .withFormat(new Avro()  
                    .avroSchema("{  \"namespace\": \"io.avrotweets\",  \"type\": \"record\",  \"name\": \"value\",  \"fields\": [    {      \"type\": \"string\",      \"name\": \"id\"    },    {      \"type\": \"string\",      \"name\": \"screen_name\"    },    {      \"type\": \"string\",      \"name\": \"text\"    }  ]}")
                )
                .inAppendMode()
                .registerTableSource(source.getName());

I get the following exception:

java.io.IOException: Failed to deserialize Avro record.
    at org.apache.flink.formats.avro.AvroRowDeserializationSchema.deserialize(AvroRowDeserializationSchema.java:170)
    at org.apache.flink.formats.avro.AvroRowDeserializationSchema.deserialize(AvroRowDeserializationSchema.java:78)
    at org.apache.flink.streaming.util.serialization.KeyedDeserializationSchemaWrapper.deserialize(KeyedDeserializationSchemaWrapper.java:44)
    at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.runFetchLoop(Kafka09Fetcher.java:142)
    at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:738)
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:94)
    at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
    at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -53
    at org.apache.avro.io.BinaryDecoder.doReadBytes(BinaryDecoder.java:336)
    at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:263)
    at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
    at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422)
    at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:414)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:232)
    at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:122)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:222)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:175)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:145)
    at org.apache.flink.formats.avro.AvroRowDeserializationSchema.deserialize(AvroRowDeserializationSchema.java:167)

I think the problem is that the message key was also serialized but using its own schema:

{
  "namespace": "io.avrotweets",
  "type": "record",
  "name": "key",
  "fields": [
    {
      "type": "string",
      "name": "name"
    }
  ]
}

but where do I tell the connector to use that schema for the key. In any case I don't know if that is the issue or not, just a guess.

Upvotes: 2

Views: 14620

Answers (1)

Georgi  Stoyanov
Georgi Stoyanov

Reputation: 539

The schema is different. For serialization you are using different number of fields, different names for fields, different name of the record. Afaik you need to have same avro schema for same object. If you want to deserialize only some of the objects, think that you can use "default" parameter.

Upvotes: 0

Related Questions