Reputation: 321
I have an exception when I deserialize a message with a field defined as logicalType date. As documentation, the field is defined as:
{"name": "startDate", "type": {"type": "int", "logicalType": "date"}}
I use "avro-maven-plugin" (1.9.2) to generate the java classes and I can set the field startDate to java.time.LocalDate.now()
; the avro object is serialize the message and send it to a kafka topic. So far, everything is good.
However, when I read the message I get the exception:
Caused by: org.apache.avro.InvalidNumberEncodingException: Invalid int encoding
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:166)
at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:551)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:195)
at org.apache.avro.generic.GenericDatumReader.readWithConversion(GenericDatumReader.java:173)
at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:134)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
What makes everything even more weird is that no error occurs if I set a different date like LocalDate.of(1970, 1, 1)
.
In other words, if the serialized int value representing the number of day since 01/01/1970 is small enough, everything works fine. I tried that test after having a look of the code that raise the exception, it made me think that if the int day is lower that 127 the error could be avoided:
public int readInt() throws IOException {
this.ensureBounds(5);
int len = 1;
int b = this.buf[this.pos] & 255;
int n = b & 127;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 7;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 14;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 21;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 28;
if (b > 127) {
throw new InvalidNumberEncodingException("Invalid int encoding");
}
}
}
}
}
....
Of course I can't use in production only date close to 01/01/1970. Any help is welcome :-)
Upvotes: 0
Views: 1624
Reputation: 86203
The code that you have posted can deserialize numbers not only up to 127, but the full range of Java int
, so up to a couple of billion corresponding to dates several million years after 1970.
The BinaryDecoder.readInt
method from Apache Avro deserializes from 1 through 5 bytes into a Java int
. It uses the last 7 bits from each byte for the int
, only not the sign bit. Instead the sign bit is used for determining how many bytes to read. A sign bit of 0 means this is the last byte. A sign bit of 1 means there are more bytes after this one. The exception is thrown in case 5 bytes are read and they all had their sign bits set to 1. 5 bytes can supply 35 bits, and an int
can hold 32 bits, so regarding more than 5 bytes as an error is fair.
So from the code that you have posted no dates that I would reasonably expect to use in an application will pose any problems.
I put your method in a TestBinaryDecoder
class to try it out (full code at the end). Let’s first see how the exception comes from 5 bytes all having their sign bit set to 1:
try {
System.out.println(new TestBinaryDecoder(-1, -1, -1, -1, -1).readInt());
} catch (IOException ioe) {
System.out.println(ioe);
}
Output:
ovv.so.binary.misc.InvalidNumberEncodingException: Invalid int encoding
Also as you said, 127 poses no problem:
System.out.println(new TestBinaryDecoder(127, -1, -1, -1, -1).readInt());
127
The interesting part comes when we put more bytes in holding bits of the int
that we want. Here the first byte has a sign bit of 1, the next has 0, so I expect those two bytes to be used:
System.out.println(new TestBinaryDecoder(255, 127, -1, -1, -1).readInt());
16383
We are already getting close to the number needed for today’s date. Today is 2021-06-04 in my time zone, day 18782 after the epoch, or in binary: 100100101011110. So let’s try putting those 15 binary digits into three bytes for the decoder:
int epochDay = new TestBinaryDecoder(0b11011110, 0b10010010, 0b1, -1, -1).readInt();
System.out.println(epochDay);
System.out.println(LocalDate.ofEpochDay(epochDay));
18782 2021-06-04
So how you got your exception I can’t tell. The source surely isn’t just a large int
value. The problem must be somewhere else.
public class TestBinaryDecoder {
private byte[] buf;
private int pos;
/** Convenience constructor */
public TestBinaryDecoder(int... buf) {
this(toByteArray(buf));
}
private static byte[] toByteArray(int[] intArray) {
byte[] byteArray = new byte[intArray.length];
IntStream.range(0, intArray.length).forEach(ix -> byteArray[ix] = (byte) intArray[ix]);
return byteArray;
}
public TestBinaryDecoder(byte[] buf) {
this.buf = buf;
pos = 0;
}
public int readInt() throws IOException {
this.ensureBounds(5);
int len = 1;
int b = this.buf[this.pos] & 255;
int n = b & 127;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 7;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 14;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 21;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 28;
if (b > 127) {
throw new InvalidNumberEncodingException("Invalid int encoding");
}
}
}
}
}
return n;
}
private void ensureBounds(int bounds) {
System.out.println("Ensuring bounds " + bounds);
}
}
class InvalidNumberEncodingException extends IOException {
public InvalidNumberEncodingException(String message) {
super(message);
}
}
Upvotes: 1