derwahre_tj
derwahre_tj

Reputation: 349

Implementation details

I was looking into the source code of the pandas library because I want to learn more about the implementation. A look at the Series class made me ponder a bit. If I hide a lot of details the class is defined like so:

class Series(np.ndarray, generic.PandasObject):
    def __new__(cls, data=None, index=None, dtype=None, name=None, copy=False):
        # some checkings
        subarray = _sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
        return subarray
    def __init__(self, data=None, index=None, dtype=None, name=None, copy=False):
        pass
    # other class methods
def _sanitize_array(data, index, dtype=None, copy=False, raise_cast_failure=False):
    # some more instance checks
    subarr = np.array(arr, dtype=object, copy=copy)
    return subarray

That got me all confused because neither has the cls argument been used nor calls to the superclasses have been made. I don't see how this code works. As far as I understand it the Series class should be just a ndarray in disguise, because that's was is returned. Clearly I'm missing something.

Upvotes: 0

Views: 74

Answers (1)

Jeff
Jeff

Reputation: 129018

In 0.12, Series is a subclass of ndarray, with lots of overriden methods. You are missing:

subarr = subarr.view(Series) which makes a ``Series`` a sub-class

In any event, the code changed quite a bit, so in 0.13, Series is now just like the other pandas objects and a sub-class of NDFrame, rather than a subclass of ndarray.

See here

Upvotes: 3

Related Questions