Reputation: 950
Is there a python module that allows me to store a number in a significantly smaller object than int
?
My memory is not big enough for some of my data structures, so I would like to represent numbers using as little space as possible.
The problem is I store the values in a tree-structure:
All integers are either keys or values. Maybe then the actual size won't be reduced much when using smaller variables - I don't know.
Upvotes: 4
Views: 2538
Reputation: 1
Your task is clearly directed towards a memory light-weight footprint of some 2B
-values.
The solution in Python has a few more things to take into account.
Assuming the range of values, meeting the range-of-values represented within a 2B
-scope of int
-s ( a range of numbers 0..65535
) to be memory-efficient is just an illusion.
Your design shall take into account, as a minimum, the overheads of associated with an object's "internal_representation" + the overheads associated with object-instance's access-methods, to get a number "there" and "back" ( not speaking about performance impact, as your requirement was focused on low-memory-footprint, but the smarter are the proxy-data-structures, the bigger typically grow the data-structure access/maintenance CPU-costs ).
If your main motivation was to design for a low-memory foot-print, the worst-case scenario shall sum up all these static memory-allocations + the requirements for any dynamic-allocations, requested during the access-methods' and object-modifications' operations + the size of the import
-ed module(s) sizes.
Once speaking about numpy
-- a great tool -- you would get a great & powerful library for fast, vectorised array manipulations, with a given cell-data representation ( dtype = numpy.uint8
).
The hidden-part ( for any assumption on the code-design low-memory-footprint ) is to take into account the overall memory-cost of the approach.
Compare not just an object size but also the Class-related overhead ( immense list of numpy methods not shown for clarity ):
>>> anIntOBJECT= 8
>>> anIntOBJECT.__sizeof__() # ____________________________instance-method works
12
>>> sys.getsizeof(anIntOBJECT) # kindly do not modify the MCVE with erroneous code
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sys' is not defined
>>> dir(anIntOBJECT)
['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__format__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'imag', 'numerator', 'real']
>>> len( dir( anIntOBJECT ) ) # Class-methods
64
Another approach with a string-based data-representation:
>>> aStrOBJECT = "08"
>>> aStrOBJECT.__sizeof__() # ____________________________instance-method works
23
>>> sys.getsizeof(aStrOBJECT) # kindly do not modify the MCVE with erroneous code
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sys' is not defined
>>> dir( aStrOBJECT )
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> len( dir( aStrOBJECT ) )
71
One more primary archetype for a mutable-sequence:
>>> aBytearrayOfINTs = bytearray() # in python base
>>> aBytearrayOfINTs.__sizeof__() # _________________________instance-method works
24
>>> dir( aBytearrayOfINTs )
['__add__', '__alloc__', '__class__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'capitalize', 'center', 'count', 'decode', 'endswith', 'expandtabs', 'extend', 'find', 'fromhex', 'index', 'insert', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'pop', 'remove', 'replace', 'reverse', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> len( dir( aBytearrayOfINTs ) )
76
In all cases, the data-structure access/maintenance methods will be a deciding factor, maybe more, than just the single-value ( minimum representation ) memory-footprint expectation.
Upvotes: 2
Reputation: 12748
as much I understand your last comment regarding numpy, you use a treelike structure.
You always can use an integer array also to store a tree.
For example:
index 0 = root 1 5
0 1 2 3 4 5 .... 15
+-------------------------------------------------------+
| 10 | 5 | 1 || ...... |
+-------------------------------------------------------+
| | ^ ^
| +----+ |
+------------------------------+
In this example, every entry is stored in three entries of the array and the first entry is the value and the others are the (relative -- multiply by 3) indices of the children.
So you can store any treelike data structure in an array
.
There is only the problem, that using the references is a little more complicated to program.
Also when your "pointers" have a different size (for example you need >65k entries -- you will need 32bit integers) you must use the bigger sizes. Of course you also could store the three values in two or even three arrays. Just make 3 arrays, one for the value (eg. only 16bit) and two for the references. In this case, you don't need to do any multiplication by yourself since your index will be the same as the normal array index in these arrays.
And by the way: I also guess, that your win would be low, when you would implement it with numpy
and let every entry in your tree be an array itself or use small objects containing just one value, since Python has a significant overhead for any object that is stored in the system. But when you use a small amount of arrays like in the example, every array is an object and the overhead is neglect-able since you store many of your entries in a small amount of arrays.
Upvotes: 3
Reputation: 363546
Have a look at numpy
arrays:
>>> import numpy as np
>>> np.array([0, 1, 2, 127, -128], dtype='int8')
array([ 0, 1, 2, 127, -128], dtype=int8)
np.int8
for example will give you efficient storage for signed integers up to 2**7 - 1
.
Warning: values out of range will "wrap around" without any overflow errors:
>>> np.int8(128)
-128
Of course, there are other dtype
-s and which is appropriate for you depends on the range of values you need.
Upvotes: 2
Reputation: 376052
The numpy
answer is a good one. Another simpler alternative is to use the array
module in the stdlib.
>>> import array
>>> array_of_ints = array.array("I")
>>> sys.getsizeof(array_of_ints)
28L
Upvotes: 1