Reputation: 28505
I was playing around with sys
's getsizeof()
and found that False
(or 0
) consists of less bytes than True
(or 1
). Why is that?
import sys
print("Zero: " + str(sys.getsizeof(0)))
print("One: " + str(sys.getsizeof(1)))
print("False: " + str(sys.getsizeof(False)))
print("True: " + str(sys.getsizeof(True)))
# Prints:
# Zero: 24
# One: 28
# False: 24
# True: 28
In fact, other numbers (also some that consist of more than one digit) are 28 bytes.
for n in range(0, 12):
print(str(n) + ": " + str(sys.getsizeof(n)))
# Prints:
# 0: 24
# 1: 28
# 2: 28
# 3: 28
# 4: 28
# 5: 28
# 6: 28
# 7: 28
# 8: 28
# 9: 28
# 10: 28
# 11: 28
Even more: sys.getsizeof(999999999)
is also 28 bytes! sys.getsizeof(9999999999)
, however, is 32.
So what's going on? I assume that the booleans True
and False
are internally converted to 0
and 1
respectively, but why is zero different in size from other lower integers?
Side question: is this specific to how Python (3) represents these items, or is this generally how digits are presented in the OS?
Upvotes: 31
Views: 2167
Reputation: 365915
Remember that Python int
values are of arbitrary size. How does that work?
Well, in CPython,1 an int is represented by a PyLong_Object
, which has an array of 4-byte chunks2, each holding 30 bits3 worth of the number.
0
takes no chunks at all.1
- (1<<30)-1
takes 1 chunk.1<<30
- (1<<60)-1
takes 2 chunks.And so on.
This is slightly oversimplified; for full details, see longintrepr.h
in the source.
In Python 2, there are two separate types, called int
and long
. An int
is represented by a C 32-bit signed integer4 embedded directly in the header, instead of an array of chunks. A long
is like a Python 3 int
.
If you do the same test with 0L
, 1L
, etc., to explicitly ask for long
values, you will get the same results as in Python 3. But without the L
suffix, any literal that fits in 32 bits gives you an int
, and only literals that are too big give you long
s.5 (This means that (1<<31)-1
is an int
, but 1<<31
is a 2-chunk long
.)
1. In a different implementation, this might not be true. IIRC, Jython does roughly the same thing as CPython, but IronPython uses a C# "bignum" implementation.
2. Why 30 bits instead of 32? Mainly because the implementation of pow
and **
can be simpler and faster if it can assume that the number of bits in two "digits" is divisible by 10
.
3. It uses the C "struct hack". Technically, a Py_LongObject
is 28 bytes, but nobody ever allocates a Py_LongObject
; they malloc 24, 28, 32, 36, etc. bytes then cast to Py_LongObject *
.
4. In fact, a Python int
is a C long
, just to make things confusing. So the C API is full of things like PyInt_FromLong
where the long
means "32-bit int" and PyLong_FromSize_t
where the long
means "bignum".
5. Early versions of Python 2.x didn't integrate int
and long
as nicely, but hopefully nobody has to worry about those anymore.
Upvotes: 32