ankush981
ankush981

Reputation: 5417

__sizeof__ not getting called by sys.getsizeof

I'm writing a dynamic array implementation in Python (similar to the built-in list class), for which I need to observe the growth in capacity (which doubles each time the limit is reached). For that I have the following code, but the output is weird. It looks like the sys.getsizeof() never calls my class's __sizeof__(). For the purpose of testing, I'm making the __sizeof__() return 0, but as per sys.getsizeof() it is non-zero.

What's the catch?

import ctypes

class DynamicArray(object):
    '''
    DYNAMIC ARRAY CLASS (Similar to Python List)
    '''

    def __init__(self):
        self.n = 0 # Count actual elements (Default is 0)
        self.capacity = 1 # Default Capacity
        self.A = self.make_array(self.capacity)

    def __len__(self):
        """
        Return number of elements sorted in array
        """
        return self.n

    def __getitem__(self,k):
        """
        Return element at index k
        """
        if not 0 <= k <self.n:
            return IndexError('K is out of bounds!') # Check it k index is in bounds of array

        return self.A[k] #Retrieve from array at index k

    def append(self, ele):
        """
        Add element to end of the array
        """
        if self.n == self.capacity:
            self._resize(2*self.capacity) #Double capacity if not enough room

        self.A[self.n] = ele #Set self.n index to element
        self.n += 1

    def _resize(self,new_cap):
        """
        Resize internal array to capacity new_cap
        """
        print("resize called!")

        B = self.make_array(new_cap) # New bigger array

        for k in range(self.n): # Reference all existing values
            B[k] = self.A[k]

        self.A = B # Call A the new bigger array
        self.capacity = new_cap # Reset the capacity

    def make_array(self,new_cap):
        """
        Returns a new array with new_cap capacity
        """
        return (new_cap * ctypes.py_object)()

    def __sizeof__(self):
        return 0

The code used to test the resizing:

arr2 = DynamicArray()

import sys

for i in range(100):
    print(len(arr2), " ", sys.getsizeof(arr2))
    arr2.append(i)

And the output:

0   24
1   24
resize called!
2   24
resize called!
3   24
4   24
resize called!
5   24
6   24
7   24
8   24
resize called!
9   24
10   24
11   24
12   24
13   24
14   24
15   24
16   24
resize called!
17   24
18   24
19   24
20   24
21   24
22   24
23   24
24   24
25   24
26   24
27   24
28   24
29   24
30   24
31   24
32   24
resize called!
33   24
34   24
35   24
36   24
37   24
38   24
39   24
40   24
41   24
42   24
43   24
44   24
45   24
46   24
47   24
48   24
49   24
50   24
51   24
52   24
53   24
54   24
55   24
56   24
57   24
58   24
59   24
60   24
61   24
62   24
63   24
64   24
resize called!
65   24
66   24
67   24
68   24
69   24
70   24
71   24
72   24
73   24
74   24
75   24
76   24
77   24
78   24
79   24
80   24
81   24
82   24
83   24
84   24
85   24
86   24
87   24
88   24
89   24
90   24
91   24
92   24
93   24
94   24
95   24
96   24
97   24
98   24
99   24

Upvotes: 4

Views: 844

Answers (1)

Dimitris Fasarakis Hilliard
Dimitris Fasarakis Hilliard

Reputation: 160557

Your __sizeof__ is getting called, it's just adding the garbage collector overhead to it which is why the result isn't zero.

From the docs on sys.getsizeof:

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

Returning 0 is one way in which you make it hard for your self to understand that it's called since you'll always get the same result back (0 + overhead).

Return a size based on the contents of the dynamic array to see it change.


To further elaborate:

Each object in CPython has some administrative information attached to it in a PyGC_head struct that gets added:

/* add gc_head size */
if (PyObject_IS_GC(o))
    return ((size_t)size) + sizeof(PyGC_Head);
return (size_t)size;

that is used by the garbage collector.

Why this is added to the overall size is probably because it does represent additional memory required by the object. On the Python level, you don't need to worry about the collection of garbage and treat it all like magic, but, when asking for information on the size of an object you should not sacrifice correct results just to keep the illusion alive.

Upvotes: 6

Related Questions