Alon Granot
Alon Granot

Reputation: 597

How to read fixed length array decimals from parquet files correctly?

I use the apache parquet-cpp library to read parquet files. When I read decimal values of a fixed length array physical type, I seem to get an extra byte which seems (I think) to specify if the number is negative. I can't seem to find documentation about this. I am also not sure why its needed as the number is already a two's complement number. So it's signed anyway.

Here's an example: a negative number which can be expressed in 8 bytes like so: CE8DFDC498D5D5F5 will be expressed in 9 bytes like so: FFCE8DFDC498D5D5F5

Does anyone know why this could be? Are there official resources on this?

Upvotes: 0

Views: 743

Answers (1)

Micah Kornfield
Micah Kornfield

Reputation: 1718

You are on the right track for negative values but FF is not a negative indicator it is a sign extension (for a positive number you would have a leading 0x00). Since the values are fixed width, negative values need to be sign-extended to preserve there interpretation in 2-complement. The actual byte width used by parquet-cpp when writing the file is determined by the precision of the decimal value stored. The byte width on reading is fixed by the file.

Upvotes: 1

Related Questions