SimonGan
SimonGan

Reputation: 29

Convert the normal value to a IBM "packed decimal"

Could anyone can help to write a sample Bigquery UDF by using SQL or JavaScript for fulfill the decimal value convert to packed decimal ?

Upvotes: 0

Views: 119

Answers (1)

Samuel
Samuel

Reputation: 3538

An IBM packed decimal integer is described here. A byte consists of 8 bit, a value between zero and nine needs 4 bit, thus a byte can hold to values from zero to nine.

In the link an example is given, the value 21544 transforms to the bit sequence:

0010-0001-0101-0100-0100-1111
2-1-5-4-4-positiv sign

I could not find the negative sign. However, for positive values this UDF packedINT should transform the value. Please check for big or little endian encoding of the bytes. Maybe the UDF packedINT_ is needed.

CREATE TEMP FUNCTION  binary(x int64) AS ((
    SELECT (STRING_AGG(CAST(x>>bit & 0x1 AS string),''ORDER BY bit DESC))
    FROM UNNEST(GENERATE_ARRAY(0,cast(8*floor(log(greatest(abs(x),1))/log(256))+7 as int64))) AS bit
      ));
CREATE TEMP FUNCTION packedINT(x int64) as ((
  SELECT sum((y-48)* (1 << (4*((length(abs(x)||'')-OFFSET)))))+15
  FROM  UNNEST(TO_CODE_POINTS( ''||ABS(x))) AS y WITH OFFSET
));
CREATE TEMP FUNCTION packedINT_(x int64) as ((
  SELECT sum((y-48)* (1 << (4*(OFFSET))))
  FROM  UNNEST(TO_CODE_POINTS( ''||ABS(x)||'?')) AS y WITH OFFSET
));

SELECT x,  packedINT(x) as packedint, binary( packedINT(x)) as bits,
binary( packedINT_(x)) as bits
FROM unnest([21544,123])  AS x

The UDF binary transforms any int value to a bit string. The longer calculation term in the unnest gets the length of the string. For each bit is calculated and then concatenated with STRING_AGG.

The UDF packedINT and packedINT_ coverts the number first to a string. The 1111 positive sign indicator can be put in the string as ?, because Bits:1111 =15=Ascii("?")-48. Then the string is unnested, so that for each character is processed. The position of the char in the string or in the number is given by OFFSET. Depending of the encoding, the OFFSET or the length of the string minus the OFFSET is needed (this is equal to reversing the string). Then we multiply this value with four, because 4 bit are needed to encode a number. The shift operator << does a 128^(4*OFFSET). The Ascii Code of the number is in y, minus 48 yields the numbers from zero to 9. Multiply with the right position 128^(4*OFFSET) and summing over all values yields the final packed value.

The packed value can be visualized with the binary function to display the bits.

Upvotes: 1

Related Questions