user8811409
user8811409

Reputation: 569

BigQuery + Javascript UDF - Not able to manipulate byte array from input

I'm noticing a discrepancy between a javascript function run in Node and a javascript function in a UDF in BigQuery.

I am running the following in BigQuery:

CREATE TEMP FUNCTION testHash(md5Bytes BYTES)
RETURNS BYTES 
LANGUAGE js AS """
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
return md5Bytes
""";

SELECT TO_HEX(testHash(MD5("test_phrase")));

and the output ends up being cb5012e39277d48ef0b5c88bded48591. (This is incorrect)

Running the same code in Node gets cb5012e39277348eb0b5c88bded48591 (which is the expected value) - notice how 2 of the characters are different.

I've narrowed down the issue to the fact that BigQuery doesn't actually apply the bitwise operators, since the output of not running these bitwise operators in Node is the same incorrect output from BQ:

md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;

Any ideas why the bitwise operators are not being applied to the md5Bytes input to the UDF?

Upvotes: 0

Views: 158

Answers (1)

Shipra Sarkar
Shipra Sarkar

Reputation: 1475

Ths bitwise operations in JavaScript UDF in BigQuery can only be applied to most significant 32 bits as mentioned in the limitations of the JavaScript UDF in this documentation. The MD5 is a hash function algorithm that takes an input and convert it into fixed-length messages of 16 bytes which is equivalent to 128 bits. Since the JavaScript UDF bitwise operations can only be applied to 32 bits that’s why it is giving unexpected output.

Upvotes: 1

Related Questions