Reputation: 4891
I would like to generate a UUID v4 string starting from the uuid
import from the Python standard library.
I know I can cast a UUID to str
by doing str(uuid.uuid4())
, however I am trying to understand what the bytes in that class instance mean. While trying to decode those bytes I see all sorts of errors, either the string is not the one I expect, or an exception is thrown. I think these bytes are UTF-16 encoded as per documentation here https://docs.python.org/3/library/uuid.html#uuid.UUID.bytes
UUID instances have these read-only attributes:
UUID.bytes The UUID as a 16-byte string (containing the six integer fields in big-endian byte order).
However what I get from those fields is not the expected UUID I get when casting to str
, why is this happening?
>>> import uuid
>>> my_uuid = uuid.uuid4()
>>> str(my_uuid)
'3f5017be-a314-4bb2-92c0-5135b47f8c45'
>>> my_uuid.bytes.decode('latin1')
'?P\x17¾£\x14K²\x92ÀQ5´\x7f\x8cE'
>>> my_uuid.bytes.decode('utf-8', 'ignore')
'?P\x17\x14KQ5\x7fE'
>>> my_uuid.bytes.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 3: invalid start byte
>>> my_uuid.bytes.decode('utf-16')
'倿븗ᒣ뉋삒㕑羴䖌'
>>> my_uuid.bytes_le.decode('utf-16')
'ើ㽐ꌔ䮲삒㕑羴䖌'
Upvotes: 0
Views: 4770
Reputation: 177665
Decoding bytes is for text not structures so do not try to decode them. To inspect the bytes, use .hex()
:
import uuid
u = uuid.uuid4()
print(u)
print(u.bytes.hex(' '))
print([hex(n) for n in u.fields])
Output:
6ea36117-e3d1-464a-94ee-1571104650a5
6e a3 61 17 e3 d1 46 4a 94 ee 15 71 10 46 50 a5
['0x6ea36117', '0xe3d1', '0x464a', '0x94', '0xee', '0x1571104650a5']
See this Raymond Chen "Old New Thing" blog article about GUIDS for more information and why the 6 integer fields are printed as 5.
Upvotes: 2
Reputation: 311606
The bytes don't "mean" anything; they are -- as the description says -- "six integer fields in big-endian byte order". You would not expect them to decode meaningfully into strings.
You can use the struct
module to unpack the individual values and convert them yourself. For example:
>>> import uuid
>>> import struct
>>> my_uuid = uuid.uuid4()
>>> my_uuid
UUID('11a0a5e5-6733-47c3-9a2d-8cd1c7d01e2c')
>>> [hex(i) for i in struct.unpack('>8H', my_uuid.bytes)]
['0x11a0', '0xa5e5', '0x6733', '0x47c3', '0x9a2d', '0x8cd1', '0xc7d0', '0x1e2c']
You can see the hex values returned in the above list correspond to the string value of the UUID.
Upvotes: 1