blue-607
blue-607

Reputation: 11

How to limit amount of memory used to store integers?

When I call sys.getsizeof(4), it returns 14. Assuming this is the same as sizeof() in C, this is unacceptably high.

I would like to use the memory array like a big, raw array of bytes. Memory overhead is of the utmost priority, due to the size of the arrays in the project in question. Portability is a huge issue, too, so dropping into C or using a more exotic library is less than optimal.

Is there a way to force Python to use less memory for a single positive signed byte list or tuple member, using only standard Python 3?

Upvotes: 1

Views: 598

Answers (2)

Kevin J. Chase
Kevin J. Chase

Reputation: 3956

(Hat tip to martineau for his comment...)

If you're only concerned with unsigned bytes (values [0, 255]), then the simplest answer might be the built-in bytearray and its immutable sibling, bytes. One potential problem is that these are intended to represent encoded strings (reading from or writing to the outside world), so their default __repr__ is "string-like", not a list of integers:

>>> lst = [0x10, 0x20, 0x30, 0x41, 0x61, 0x7f, 0x80, 0xff]
>>> bytearray(lst)
bytearray(b'\x10 0Aa\x7f\x80\xff')
>>> bytes(lst)
b'\x10 0Aa\x7f\x80\xff'

Note that space, '0', 'A', and 'a' appear literally, while "unprintable" values appear as '\x##' string escape sequences. If you're trying to think of those bytes as a bunch of integers, this is not what you want.

For homogeneous arrays of fixed-width integers or floats (much like in C), use the standard library's array module.

>>> import array

# One megabyte of unsigned 8-bit integers.
>>> a = array.array('B', (n % 2**8 for n in range(2**20)))
>>> len(a)
1048576
>>> a.typecode
'B'
>>> a.itemsize
1
>>> a.buffer_info()  # Memory address, memory size.
(24936384, 1048576)

>>> a_slice = a[slice(1024, 1040)]  # Can be sliced like a list.
>>> a_slice
array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
>>> type(a_slice)  # Slice is also an array, not a list.
<class 'array.array'>

For more complex data, the struct module is for packing heterogeneous records, much like C's struct keyword. Unlike C, I don't see any obvious way to create an array of structs.

These data structures all make use of Python's Buffer Protocol, which (in CPython, at least) allows a Python class to expose its inner C-like array directly to other Python code. If you need to do something complicated, you might have to learn this... or give up and use NumPy.

Upvotes: 2

Paul Panzer
Paul Panzer

Reputation: 53029

14 strikes me as rather low considering that a Python object must at least have a pointer to its type struct and a refcount.

PyObject

All object types are extensions of this type. This is a type which contains the information Python needs to treat a pointer to an object as an object. In a normal “release” build, it contains only the object’s reference count and a pointer to the corresponding type object. Nothing is actually declared to be a PyObject, but every pointer to a Python object can be cast to a PyObject*. Access to the members must be done by using the macros Py_REFCNT and Py_TYPE.

This overhead you will have for every Python object. The only way to reduce the overhead / payload ratio is to have more payload as for example in arrays (both plain Python and numpy).

The trick here is that array elements typically are not Python objects, so they can dispense with the refcount and type pointer and occupy just as much memory as the underlying C type.

Upvotes: 1

Related Questions