Reputation: 11
When I call sys.getsizeof(4)
, it returns 14
. Assuming this is the same as sizeof()
in C, this is unacceptably high.
I would like to use the memory array like a big, raw array of bytes. Memory overhead is of the utmost priority, due to the size of the arrays in the project in question. Portability is a huge issue, too, so dropping into C or using a more exotic library is less than optimal.
Is there a way to force Python to use less memory for a single positive signed byte list or tuple member, using only standard Python 3?
Upvotes: 1
Views: 598
Reputation: 3956
(Hat tip to martineau for his comment...)
If you're only concerned with unsigned bytes (values [0, 255]), then the simplest answer might be the built-in bytearray
and its immutable sibling, bytes
.
One potential problem is that these are intended to represent encoded strings (reading from or writing to the outside world), so their default __repr__
is "string-like", not a list of integers:
>>> lst = [0x10, 0x20, 0x30, 0x41, 0x61, 0x7f, 0x80, 0xff]
>>> bytearray(lst)
bytearray(b'\x10 0Aa\x7f\x80\xff')
>>> bytes(lst)
b'\x10 0Aa\x7f\x80\xff'
Note that space, '0'
, 'A'
, and 'a'
appear literally, while "unprintable" values appear as '\x##'
string escape sequences.
If you're trying to think of those bytes as a bunch of integers, this is not what you want.
For homogeneous arrays of fixed-width integers or floats (much like in C), use the standard library's array
module.
>>> import array
# One megabyte of unsigned 8-bit integers.
>>> a = array.array('B', (n % 2**8 for n in range(2**20)))
>>> len(a)
1048576
>>> a.typecode
'B'
>>> a.itemsize
1
>>> a.buffer_info() # Memory address, memory size.
(24936384, 1048576)
>>> a_slice = a[slice(1024, 1040)] # Can be sliced like a list.
>>> a_slice
array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
>>> type(a_slice) # Slice is also an array, not a list.
<class 'array.array'>
For more complex data, the struct
module is for packing heterogeneous records, much like C's struct
keyword.
Unlike C, I don't see any obvious way to create an array
of struct
s.
These data structures all make use of Python's Buffer Protocol, which (in CPython, at least) allows a Python class to expose its inner C-like array directly to other Python code. If you need to do something complicated, you might have to learn this... or give up and use NumPy.
Upvotes: 2
Reputation: 53029
14 strikes me as rather low considering that a Python object must at least have a pointer to its type struct and a refcount.
PyObject
All object types are extensions of this type. This is a type which contains the information Python needs to treat a pointer to an object as an object. In a normal “release” build, it contains only the object’s reference count and a pointer to the corresponding type object. Nothing is actually declared to be a PyObject, but every pointer to a Python object can be cast to a PyObject*. Access to the members must be done by using the macros Py_REFCNT and Py_TYPE.
This overhead you will have for every Python object. The only way to reduce the overhead / payload ratio is to have more payload as for example in arrays (both plain Python and numpy).
The trick here is that array elements typically are not Python objects, so they can dispense with the refcount and type pointer and occupy just as much memory as the underlying C type.
Upvotes: 1