TPPZ
TPPZ

Reputation: 4891

Decode UUID 4 as a Python string

I would like to generate a UUID v4 string starting from the uuid import from the Python standard library.

I know I can cast a UUID to str by doing str(uuid.uuid4()), however I am trying to understand what the bytes in that class instance mean. While trying to decode those bytes I see all sorts of errors, either the string is not the one I expect, or an exception is thrown. I think these bytes are UTF-16 encoded as per documentation here https://docs.python.org/3/library/uuid.html#uuid.UUID.bytes

UUID instances have these read-only attributes:

UUID.bytes The UUID as a 16-byte string (containing the six integer fields in big-endian byte order).

However what I get from those fields is not the expected UUID I get when casting to str, why is this happening?

>>> import uuid
>>> my_uuid = uuid.uuid4()
>>> str(my_uuid)
'3f5017be-a314-4bb2-92c0-5135b47f8c45'
>>> my_uuid.bytes.decode('latin1')
'?P\x17¾£\x14K²\x92ÀQ5´\x7f\x8cE'
>>> my_uuid.bytes.decode('utf-8', 'ignore')
'?P\x17\x14KQ5\x7fE'
>>> my_uuid.bytes.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 3: invalid start byte
>>> my_uuid.bytes.decode('utf-16')
'倿븗ᒣ뉋삒㕑羴䖌'
>>> my_uuid.bytes_le.decode('utf-16')
'ើ㽐ꌔ䮲삒㕑羴䖌'

Upvotes: 0

Views: 4770

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 177665

Decoding bytes is for text not structures so do not try to decode them. To inspect the bytes, use .hex():

import uuid
u = uuid.uuid4()
print(u)
print(u.bytes.hex(' '))
print([hex(n) for n in u.fields])

Output:

6ea36117-e3d1-464a-94ee-1571104650a5
6e a3 61 17 e3 d1 46 4a 94 ee 15 71 10 46 50 a5
['0x6ea36117', '0xe3d1', '0x464a', '0x94', '0xee', '0x1571104650a5']

See this Raymond Chen "Old New Thing" blog article about GUIDS for more information and why the 6 integer fields are printed as 5.

Upvotes: 2

larsks
larsks

Reputation: 311606

The bytes don't "mean" anything; they are -- as the description says -- "six integer fields in big-endian byte order". You would not expect them to decode meaningfully into strings.

You can use the struct module to unpack the individual values and convert them yourself. For example:

>>> import uuid
>>> import struct
>>> my_uuid = uuid.uuid4()
>>> my_uuid
UUID('11a0a5e5-6733-47c3-9a2d-8cd1c7d01e2c')
>>> [hex(i) for i in struct.unpack('>8H', my_uuid.bytes)]
['0x11a0', '0xa5e5', '0x6733', '0x47c3', '0x9a2d', '0x8cd1', '0xc7d0', '0x1e2c']

You can see the hex values returned in the above list correspond to the string value of the UUID.

Upvotes: 1

Related Questions