Reputation: 1588
Everything in Python is an object. So the size of an int in Python will be larger than usual.
>>> sys.getsizeof(int())
24
OK, but why does it take 12 more bytes for 2⁶³
compared too 2⁶³ - 1
and not just one?
>>> sys.getsizeof(2**63)
36
>>> sys.getsizeof(2**62)
24
I get that 2⁶³
is a long and 2⁶³-1
an int, but why 12 bytes of difference?
No more intuitive, I tried some other things:
>>> a = 2**63
>>> a -= 2**62
>>> sys.getsizeof(a)
36
a
is still stored as a long even if it could be in an int now. So that's not surprising. But:
>>> a -= (2**63 - 1)
>>> a = 2**63
>>> a -= (2**63 - 1)
>>> a
1L
>>> sys.getsizeof(a)
28
A new size.
>>> a = 2**63
>>> a -= 2**63
>>> a
0L
>>> sys.getsizeof(a)
24
Back to 24 bytes, but still with a long.
Last thing I got:
>>> sys.getsizeof(long())
24
Question:
How does the memory storage work in those scenarios?
Sub-questions:
Why is there a gap of 12 bytes to add what our intuition tells us is just 1 bit?
Why are int()
and long()
24 bytes, but long(1)
is already 28 bytes and int(2⁶²)
?
NB: Python 3.X is working a bit differently, but not more intuitively. Here I focused on Python 2.7; I did not test on prior versions.
Upvotes: 51
Views: 6329
Reputation: 155216
why does it get 12 more bytes for 2⁶³ compared too 2⁶³ - 1 and not just one?
On an LP64 system1, a Python 2 int
consists of exactly three pointer-sized pieces:
long int
That's 24 bytes in total. On the other hand, a Python long
consists of:
2**63 requires 64 bits to store, so it fits in three 30-bit digits. Since each digit is 4 bytes wide, the whole Python long
will take 24+3*4 = 36 bytes.
In other words, the difference comes from long
having to separately store the size of the number (8 additional bytes) and from it being slightly less space-efficient about storing the value (12 bytes to store the digits of 2**63). Including the size, the value 2**63 in a long
occupies 20 bytes. Comparing that to the 8 bytes occupied by any value of the simple int
yields the observed 12-byte difference.
It is worth noting that Python 3 only has one integer type, called int
, which is variable-width, and implemented the same way as Python 2 long
.
long int
, presumably for source compatibility with a large body of older code that used char
, short
, and long
as "convenient" aliases for 8, 16, and 32-bit values that happened to work on both 16 and 32-bit systems. To get an actual 64-bit type on x86-64 Windows, one must use __int64
or (on newer compiler versions) long long
or int64_t
. Since Python 2 internally depends on Python int
fitting into a C long in various places, sys.maxint
remains 2**31-1
, even on 64-bit Windows. This quirk is also fixed in Python 3, which has no concept of maxint.
Upvotes: 62
Reputation: 12837
While I didn't find it in the documentation, here is my explanation.
Python 2 promotes int
to long
implicitly, when the value exceeds the value that can be stored in int. The size of the new type (long
) is the default size of long
, which is 32. From now on, the size of your variable, will be determined by its value, which can go up and down.
from sys import getsizeof as size
a = 1
n = 32
# going up
for i in range(10):
if not i:
print 'a = %100s%13s%4s' % (str(a), type(a), size(a))
else:
print 'a = %100s%14s%3s' % (str(a), type(a), size(a))
a <<= n
# going down
for i in range(11):
print 'a = %100s%14s%3s' % (str(a), type(a), size(a))
a >>= n
a = 1 <type 'int'> 24
a = 4294967296 <type 'long'> 32
a = 18446744073709551616 <type 'long'> 36
a = 79228162514264337593543950336 <type 'long'> 40
a = 340282366920938463463374607431768211456 <type 'long'> 44
a = 1461501637330902918203684832716283019655932542976 <type 'long'> 48
a = 6277101735386680763835789423207666416102355444464034512896 <type 'long'> 52
a = 26959946667150639794667015087019630673637144422540572481103610249216 <type 'long'> 56
a = 115792089237316195423570985008687907853269984665640564039457584007913129639936 <type 'long'> 60
a = 497323236409786642155382248146820840100456150797347717440463976893159497012533375533056 <type 'long'> 64
a = 2135987035920910082395021706169552114602704522356652769947041607822219725780640550022962086936576 <type 'long'> 68
a = 497323236409786642155382248146820840100456150797347717440463976893159497012533375533056 <type 'long'> 64
a = 115792089237316195423570985008687907853269984665640564039457584007913129639936 <type 'long'> 60
a = 26959946667150639794667015087019630673637144422540572481103610249216 <type 'long'> 56
a = 6277101735386680763835789423207666416102355444464034512896 <type 'long'> 52
a = 1461501637330902918203684832716283019655932542976 <type 'long'> 48
a = 340282366920938463463374607431768211456 <type 'long'> 44
a = 79228162514264337593543950336 <type 'long'> 40
a = 18446744073709551616 <type 'long'> 36
a = 4294967296 <type 'long'> 32
a = 1 <type 'long'> 28
As you can see, the type stays long
after it first became too big for an int
, and the initial size was 32, but the size changes with the value (can be higher or lower [or equal, obviously] to 32)
So, to answer your question, the base size is 24 for int
, and 28 for long
, while long
has also the space for saving large values (which starts as 4 bytes - hence 32 bytes for long
, but can go up and down according to the value)
As for your sub-question, creating a unique type (with a unique size) for a new number is impossible, so Python has "sub classes" of long
type, which deal with a range of numbers, therefore, once you over the limit of your old long
you must use the newer, which accounts for much larger numbers too, therefore, it has a few bytes more.
Upvotes: 5