Reputation: 950

Python: Declare 2 Byte Variables

Is there a python module that allows me to store a number in a significantly smaller object than int?

My memory is not big enough for some of my data structures, so I would like to represent numbers using as little space as possible.

The problem is I store the values in a tree-structure:

All integers are either keys or values. Maybe then the actual size won't be reduced much when using smaller variables - I don't know.

Upvotes: 4

Answers (4)

user3666197

Reputation: 1

Memory-compact design

Your task is clearly directed towards a memory light-weight footprint of some 2B-values.

The solution in Python has a few more things to take into account.

Assuming the range of values, meeting the range-of-values represented within a 2B-scope of int-s ( a range of numbers 0..65535 ) to be memory-efficient is just an illusion.

Your design shall take into account, as a minimum, the overheads of associated with an object's "internal_representation" + the overheads associated with object-instance's access-methods, to get a number "there" and "back" ( not speaking about performance impact, as your requirement was focused on low-memory-footprint, but the smarter are the proxy-data-structures, the bigger typically grow the data-structure access/maintenance CPU-costs ).

If your main motivation was to design for a low-memory foot-print, the worst-case scenario shall sum up all these static memory-allocations + the requirements for any dynamic-allocations, requested during the access-methods' and object-modifications' operations + the size of the import-ed module(s) sizes.

Once speaking about numpy -- a great tool -- you would get a great & powerful library for fast, vectorised array manipulations, with a given cell-data representation ( dtype = numpy.uint8 ).

The hidden-part ( for any assumption on the code-design low-memory-footprint ) is to take into account the overall memory-cost of the approach.

Compare not just an object size but also the Class-related overhead ( immense list of numpy methods not shown for clarity ):

>>> anIntOBJECT= 8
>>> anIntOBJECT.__sizeof__()   # ____________________________instance-method works
12
>>> sys.getsizeof(anIntOBJECT) # kindly do not modify the MCVE with erroneous code
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sys' is not defined
>>> dir(anIntOBJECT)
['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', '__format__', '__getattribute__', '__getnewargs__', '__hash__', '__hex__', '__index__', '__init__', '__int__', '__invert__', '__long__', '__lshift__', '__mod__', '__mul__', '__neg__', '__new__', '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__trunc__', '__xor__', 'bit_length', 'conjugate', 'denominator', 'imag', 'numerator', 'real']
>>> len( dir( anIntOBJECT ) )                  # Class-methods
64

Another approach with a string-based data-representation:

>>> aStrOBJECT = "08"
>>> aStrOBJECT.__sizeof__()    # ____________________________instance-method works
23
>>> sys.getsizeof(aStrOBJECT)  # kindly do not modify the MCVE with erroneous code
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sys' is not defined
>>> dir( aStrOBJECT )
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> len( dir( aStrOBJECT ) )
71

One more primary archetype for a mutable-sequence:

>>> aBytearrayOfINTs = bytearray()                                # in python base
>>> aBytearrayOfINTs.__sizeof__() # _________________________instance-method works
24
>>> dir( aBytearrayOfINTs )
['__add__', '__alloc__', '__class__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'capitalize', 'center', 'count', 'decode', 'endswith', 'expandtabs', 'extend', 'find', 'fromhex', 'index', 'insert', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'pop', 'remove', 'replace', 'reverse', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> len( dir( aBytearrayOfINTs ) )
76

Real data-structure implementation: access & maintenance methods decide

In all cases, the data-structure access/maintenance methods will be a deciding factor, maybe more, than just the single-value ( minimum representation ) memory-footprint expectation.

Upvotes: 2

Juergen

Reputation: 12748

as much I understand your last comment regarding numpy, you use a treelike structure.

You always can use an integer array also to store a tree.

For example:

index 0 = root     1                     5
      0   1   2    3   4   5 ....        15
   +-------------------------------------------------------+
   | 10 | 5 | 1 ||      ......                             |
   +-------------------------------------------------------+
          |   |    ^                     ^
          |   +----+                     |
          +------------------------------+

In this example, every entry is stored in three entries of the array and the first entry is the value and the others are the (relative -- multiply by 3) indices of the children.

So you can store any treelike data structure in an array.

There is only the problem, that using the references is a little more complicated to program.

Also when your "pointers" have a different size (for example you need >65k entries -- you will need 32bit integers) you must use the bigger sizes. Of course you also could store the three values in two or even three arrays. Just make 3 arrays, one for the value (eg. only 16bit) and two for the references. In this case, you don't need to do any multiplication by yourself since your index will be the same as the normal array index in these arrays.

And by the way: I also guess, that your win would be low, when you would implement it with numpy and let every entry in your tree be an array itself or use small objects containing just one value, since Python has a significant overhead for any object that is stored in the system. But when you use a small amount of arrays like in the example, every array is an object and the overhead is neglect-able since you store many of your entries in a small amount of arrays.

Upvotes: 3

wim

Reputation: 363546

Have a look at numpy arrays:

>>> import numpy as np
>>> np.array([0, 1, 2, 127, -128], dtype='int8')
array([   0,    1,    2,  127, -128], dtype=int8)

np.int8 for example will give you efficient storage for signed integers up to 2**7 - 1.

Warning: values out of range will "wrap around" without any overflow errors:

>>> np.int8(128)
-128

Of course, there are other dtype-s and which is appropriate for you depends on the range of values you need.

Upvotes: 2

Ned Batchelder

Reputation: 376052

The numpy answer is a good one. Another simpler alternative is to use the array module in the stdlib.

>>> import array
>>> array_of_ints = array.array("I")
>>> sys.getsizeof(array_of_ints)
28L

Upvotes: 1

Python: Declare 2 Byte Variables

Answers (4)

Memory-compact design

Real data-structure implementation: access & maintenance methods decide

Related Questions