Daren Schwenke
Daren Schwenke

Reputation: 5478

Oracle GUIDTORAW and RAWTOGUID function endianness

My app internally uses RAW(16) guids generated via sys_guid().
Recently I started associating Active Directory users where I get guid's in the format of: 00000000-0000-0000-0000-00000000000

Are the following two functions doing this conversion correctly?

Found the following site:

http://www.timvasil.com/blog14/post/2009/01/20/User-defined-function-to-convert-from-RAW(16)-to-a-GUID-in-Oracle.aspx

create or replace
FUNCTION RAWTOGUID
( RawData IN RAW
) RETURN VARCHAR AS

BEGIN

declare HexData varchar(32) := rawtohex(RawData);

begin
return
    substr(HexData, 7, 2) 
    || substr(HexData, 5, 2) 
    || substr(HexData, 3, 2) 
    || substr(HexData, 1, 2) 
    || '-'
    || substr(HexData, 11, 2) 
    || substr(HexData, 9, 2) 
    || '-'
    || substr(HexData, 15, 2) 
    || substr(HexData, 13, 2) 
    || '-'
    || substr(HexData, 17, 4) 
    || '-'
    || substr(HexData, 21, 12);
end;

END RAWTOGUID;

Adding in the following site:

http://dbaspot.com/oracle-server/69226-guid-char-conversion-function.html

Came up with this function to do the inverse:

create or replace
FUNCTION GUIDTORAW
( HexData IN VARCHAR
) RETURN RAW AS

BEGIN

declare StringData varchar(32) := TRANSLATE(HexData,'0{-}','0');

begin

return
    hextoraw(substr(StringData, 7, 2) 
    || substr(StringData, 5, 2) 
    || substr(StringData, 3, 2) 
    || substr(StringData, 1, 2)
    || substr(StringData, 11, 2) 
    || substr(StringData, 9, 2) 
    || substr(StringData, 15, 2) 
    || substr(StringData, 13, 2)
    || substr(StringData, 17, 4)
    || substr(StringData, 21, 12));
end;

END GUIDTORAW;

They convert back and forth, but am I actually respecting the endianness or have the right order at all?

Upvotes: 5

Views: 2894

Answers (1)

Danny Varod
Danny Varod

Reputation: 18068

Quoting from the UUID standard:

The structure of UUIDs is:

   Field                  Data Type     Octet  Note
                                        #

   time_low               unsigned 32   0-3    The low field of the
                          bit integer          timestamp

   time_mid               unsigned 16   4-5    The middle field of the
                          bit integer          timestamp

   time_hi_and_version    unsigned 16   6-7    The high field of the
                          bit integer          timestamp multiplexed
                                               with the version number

   clock_seq_hi_and_rese  unsigned 8    8      The high field of the
   rved                   bit integer          clock sequence
                                               multiplexed with the
                                               variant

   clock_seq_low          unsigned 8    9      The low field of the
                          bit integer          clock sequence

   node                   unsigned 48   10-15  The spatially unique
                          bit integer          node identifier

In the absence of explicit application or presentation protocol specification to the contrary, a UUID is encoded as a 128-bit object, as follows:

The fields are encoded as 16 octets, with the sizes and order of the fields defined above, and with each field encoded with the Most Significant Byte first (known as network byte order). Note that the field names, particularly for multiplexed fields, follow historical practice.

   0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          time_low                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       time_mid                |         time_hi_and_version   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |clk_seq_hi_res |  clk_seq_low  |         node (0-1)            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                         node (2-5)                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Quoting from the Guid documentation:

public Guid(int a, short b, short c, byte[] d)

Guid(1,2,3,new byte[]{0,1,2,3,4,5,6,7})
creates a Guid that corresponds to "00000001-0002-0003-0001-020304050607".

According to Wikipedia's Endianness article Windows stores numbers in little endian --> least significant byte first.

The mapping 1,2,3,new byte[]{0,1,2,3,4,5,6,7} to "00000001-0002-0003-0001-020304050607", shows us that the numbers are displayed big-endian, as in the UUID standard, however, the byte array is provided in the same order as display - no need to swap the bytes.

So Guids are displayed as:

{time_low (4B) - time_mid (2B) - time_hi_and_version (2B) - clock_sq_hi_and_reserved(1B), clock_seq_low (1B) - node (6B)}

In little endian this results in byte order (byte[] does not count as a number as is therefore :

{3,2,1,0 - 5,4 - 7,6 - 8,9 - 10,11,12,13,14,15}

which results in hexidecimal character order (each byte is 2 hexidecimal digits):

{6,7,4,5,2,3,0,1 - 10,11,8,9 - 14,15,12,13 - 16,17,18,19 - 20,21,22,23,24,25,26,27,28,29,30,31}

In Oracle, the substr string function is 1-based, therefore the Oracle string indexes are:

{7,8,5,6,3,4,1,2 - 11,12,9,10 - 15,16,13,14 - 17,18,19,20 - 21,22,23,24,25,26,27,28,29,30,31,32}

Which results in the command

substr(HexData, 7, 2) 
|| substr(HexData, 5, 2) 
|| substr(HexData, 3, 2) 
|| substr(HexData, 1, 2) 
|| '-'
|| substr(HexData, 11, 2) 
|| substr(HexData, 9, 2) 
|| '-'
|| substr(HexData, 15, 2) 
|| substr(HexData, 13, 2) 
|| '-'
|| substr(HexData, 17, 4) 
|| '-'
|| substr(HexData, 21, 12);

And transposed (after removing '-'s):

{7,8,5,6,3,4,1,2, 11,12,9,10, 15,16,13,14, 17,18,19,20, 21,22,23,24,25,26,27,28,29,30,31,32}

is reversed back into

{1,2,3,4,5,6,7,8, 9,10,11,12, 13,14,15,16, 17,18,19,20, 21,22,23,24,25,26,27,28,29,30,31,32}

using the same function (without adding '-'s) - converting BE to LE and LE to BE have the same swaps since the bytes are simply reversed and reversing reversed bytes results in non-reversed bytes.

Upvotes: 6

Related Questions