Reputation: 597
I use the apache parquet-cpp library to read parquet files. When I read decimal values of a fixed length array physical type, I seem to get an extra byte which seems (I think) to specify if the number is negative. I can't seem to find documentation about this. I am also not sure why its needed as the number is already a two's complement number. So it's signed anyway.
Here's an example: a negative number which can be expressed in 8 bytes like so: CE8DFDC498D5D5F5 will be expressed in 9 bytes like so: FFCE8DFDC498D5D5F5
Does anyone know why this could be? Are there official resources on this?
Upvotes: 0
Views: 743
Reputation: 1718
You are on the right track for negative values but FF is not a negative indicator it is a sign extension (for a positive number you would have a leading 0x00). Since the values are fixed width, negative values need to be sign-extended to preserve there interpretation in 2-complement. The actual byte width used by parquet-cpp when writing the file is determined by the precision of the decimal value stored. The byte width on reading is fixed by the file.
Upvotes: 1