Reputation: 29
Could anyone can help to write a sample Bigquery UDF by using SQL or JavaScript for fulfill the decimal value convert to packed decimal ?
Upvotes: 0
Views: 119
Reputation: 3538
An IBM packed decimal integer is described here. A byte consists of 8 bit, a value between zero and nine needs 4 bit, thus a byte can hold to values from zero to nine.
In the link an example is given, the value 21544
transforms to the bit sequence:
0010-0001-0101-0100-0100-1111
2-1-5-4-4-positiv sign
I could not find the negative sign. However, for positive values this UDF packedINT
should transform the value. Please check for big or little endian encoding of the bytes. Maybe the UDF packedINT_
is needed.
CREATE TEMP FUNCTION binary(x int64) AS ((
SELECT (STRING_AGG(CAST(x>>bit & 0x1 AS string),''ORDER BY bit DESC))
FROM UNNEST(GENERATE_ARRAY(0,cast(8*floor(log(greatest(abs(x),1))/log(256))+7 as int64))) AS bit
));
CREATE TEMP FUNCTION packedINT(x int64) as ((
SELECT sum((y-48)* (1 << (4*((length(abs(x)||'')-OFFSET)))))+15
FROM UNNEST(TO_CODE_POINTS( ''||ABS(x))) AS y WITH OFFSET
));
CREATE TEMP FUNCTION packedINT_(x int64) as ((
SELECT sum((y-48)* (1 << (4*(OFFSET))))
FROM UNNEST(TO_CODE_POINTS( ''||ABS(x)||'?')) AS y WITH OFFSET
));
SELECT x, packedINT(x) as packedint, binary( packedINT(x)) as bits,
binary( packedINT_(x)) as bits
FROM unnest([21544,123]) AS x
The UDF binary
transforms any int value to a bit string. The longer calculation term in the unnest
gets the length of the string. For each bit is calculated and then concatenated with STRING_AGG
.
The UDF packedINT
and packedINT_
coverts the number first to a string. The 1111
positive sign indicator can be put in the string as ?
, because Bits:1111 =15=Ascii("?")-48
. Then the string is unnested, so that for each character is processed. The position of the char in the string or in the number is given by OFFSET
. Depending of the encoding, the OFFSET or the length of the string minus the OFFSET is needed (this is equal to reversing the string). Then we multiply this value with four, because 4 bit are needed to encode a number. The shift operator <<
does a 128^(4*OFFSET)
. The Ascii Code of the number is in y
, minus 48 yields the numbers from zero to 9. Multiply with the right position 128^(4*OFFSET)
and summing over all values yields the final packed value.
The packed value can be visualized with the binary function to display the bits.
Upvotes: 1