Casio
Casio

Reputation: 13

split utf-8 string into bytes in python

I am trying to split an UTF-8 string into bytes in python 3. The problem is, when I use bytearray, byte, encode etc functions I always get an array with size of element 14 bytes, not 1 byte as I expected. I need to split any text file into sequence of bytes and send them byte after byte using sockets. I tried something like this:

infile = open (file, "r")
str = infile.read()
byte_str = bytes(str, 'UTF-8')
print("size of byte_str",sys.getsizeof(byte_str[0]))

Print gives me 14, but I need 1... Any suggestion?

Upvotes: 0

Views: 383

Answers (1)

Łukasz Rogalski
Łukasz Rogalski

Reputation: 23223

Quoting official documentation:

sys.getsizeof(object[, default])

Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

If given, default will be returned if the object does not provide means to retrieve the size. Otherwise a TypeError will be raised.

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

See recursive sizeof recipe for an example of using getsizeof() recursively to find the size of containers and all their contents.

Upvotes: 1

Related Questions