hlu
hlu

Reputation: 497

Subclass pandas DataFrame with required argument

I'm working on a new data structure that subclasses pandas DataFrame. I want to enforce my new data structure to have new_property, so that it can be processed safely later on. However, I'm running into error when using my new data structure, because the constructor gets called by some internal pandas function without the required property. Here is my new data structure.

import pandas as pd
class MyDataFrame(pd.DataFrame):

    @property
    def _constructor(self):
        return MyDataFrame

    _metadata = ['new_property']

    def __init__(self, data, new_property, index=None, columns=None, dtype=None, copy=True):

        super(MyDataFrame, self).__init__(data=data,
                                          index=index,
                                          columns=columns,
                                          dtype=dtype,
                                          copy=copy)
        self.new_property = new_property

Here is an example that causes error

data1 = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [15, 25, 30], 'd': [1, 1, 2]}
df1 = MyDataFrame(data1, new_property='value')
df1[['a', 'b']]

Here is the error message

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-
packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-33-b630fbf14234>", line 1, in <module>
    df1[['a', 'b']]
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2053, in __getitem__
    return self._getitem_array(key)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2098, in _getitem_array
    return self.take(indexer, axis=1, convert=True)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1670, in take
    result = self._constructor(new_data).__finalize__(self)
TypeError: __init__() missing 1 required positional argument: 'new_property'

Is there a fix to this or an alternative way to design this to enforce my new data structure to have new_property?

Thanks in advance!

Upvotes: 7

Views: 1586

Answers (2)

Stuart Buckingham
Stuart Buckingham

Reputation: 1774

I know this is an old issue, but I wanted to extend on hlu's answer.

When implementing the answer described by hlu, I was getting the following error when just trying to print the subclassed DataFrame: AttributeError: 'internal_constructor' object has no attribute '_from_axes'

To fix this, I have used an object instead of the function used in hlu's answer to be able to implement the _from_axes method on the callable.

There is no classmethod type decorator for the _internal_constructor class, so instead we instantiate it with the callers class so it can be used when the _internal_constructor is called.

class MyDataFrame(pd.DataFrame):
    @property
    def _constructor(self):
        return MyDataFrame._internal_constructor(self.__class__)

    class _internal_constructor(object):
        def __init__(self, cls):
            self.cls = cls

        def __call__(self, *args, **kwargs):
            kwargs['my_required_argument'] = None
            return self.cls(*args, **kwargs)

        def _from_axes(self, *args, **kwargs):
            return self.cls._from_axes(*args, **kwargs)

Upvotes: 0

hlu
hlu

Reputation: 497

This question has been answered by a brilliant pandas developer. See this issue for more details. Pasting the answer here.

class MyDataFrame(pd.DataFrame):
    @property
    def _constructor(self):
        return MyDataFrame._internal_ctor

    _metadata = ['new_property']

    @classmethod
    def _internal_ctor(cls, *args, **kwargs):
        kwargs['new_property'] = None
        return cls(*args, **kwargs)

    def __init__(self, data, new_property, index=None, columns=None, dtype=None, copy=True):
        super(MyDataFrame, self).__init__(data=data,
                                      index=index,
                                      columns=columns,
                                      dtype=dtype,
                                      copy=copy)
        self.new_property = new_property

data1 = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [15, 25, 30], 'd': [1, 1, 2]}
df1 = MyDataFrame(data1, new_property='value')

df1[['a', 'b']].new_property
Out[121]: 'value'

MyDataFrame(data1)
TypeError: __init__() missing 1 required positional argument: 'new_property'

Upvotes: 5

Related Questions