Reputation: 497
I'm working on a new data structure that subclasses pandas DataFrame. I want to enforce my new data structure to have new_property, so that it can be processed safely later on. However, I'm running into error when using my new data structure, because the constructor gets called by some internal pandas function without the required property. Here is my new data structure.
import pandas as pd
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame
_metadata = ['new_property']
def __init__(self, data, new_property, index=None, columns=None, dtype=None, copy=True):
super(MyDataFrame, self).__init__(data=data,
index=index,
columns=columns,
dtype=dtype,
copy=copy)
self.new_property = new_property
Here is an example that causes error
data1 = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [15, 25, 30], 'd': [1, 1, 2]}
df1 = MyDataFrame(data1, new_property='value')
df1[['a', 'b']]
Here is the error message
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-
packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-33-b630fbf14234>", line 1, in <module>
df1[['a', 'b']]
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2053, in __getitem__
return self._getitem_array(key)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2098, in _getitem_array
return self.take(indexer, axis=1, convert=True)
File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1670, in take
result = self._constructor(new_data).__finalize__(self)
TypeError: __init__() missing 1 required positional argument: 'new_property'
Is there a fix to this or an alternative way to design this to enforce my new data structure to have new_property?
Thanks in advance!
Upvotes: 7
Views: 1586
Reputation: 1774
I know this is an old issue, but I wanted to extend on hlu's answer.
When implementing the answer described by hlu, I was getting the following error when just trying to print the subclassed DataFrame: AttributeError: 'internal_constructor' object has no attribute '_from_axes'
To fix this, I have used an object instead of the function used in hlu's answer to be able to implement the _from_axes
method on the callable.
There is no classmethod type decorator for the _internal_constructor class, so instead we instantiate it with the callers class so it can be used when the _internal_constructor is called.
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame._internal_constructor(self.__class__)
class _internal_constructor(object):
def __init__(self, cls):
self.cls = cls
def __call__(self, *args, **kwargs):
kwargs['my_required_argument'] = None
return self.cls(*args, **kwargs)
def _from_axes(self, *args, **kwargs):
return self.cls._from_axes(*args, **kwargs)
Upvotes: 0
Reputation: 497
This question has been answered by a brilliant pandas developer. See this issue for more details. Pasting the answer here.
class MyDataFrame(pd.DataFrame):
@property
def _constructor(self):
return MyDataFrame._internal_ctor
_metadata = ['new_property']
@classmethod
def _internal_ctor(cls, *args, **kwargs):
kwargs['new_property'] = None
return cls(*args, **kwargs)
def __init__(self, data, new_property, index=None, columns=None, dtype=None, copy=True):
super(MyDataFrame, self).__init__(data=data,
index=index,
columns=columns,
dtype=dtype,
copy=copy)
self.new_property = new_property
data1 = {'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [15, 25, 30], 'd': [1, 1, 2]}
df1 = MyDataFrame(data1, new_property='value')
df1[['a', 'b']].new_property
Out[121]: 'value'
MyDataFrame(data1)
TypeError: __init__() missing 1 required positional argument: 'new_property'
Upvotes: 5