Reputation: 565
I created a pass-through wrapper class around an existing class from sklearn
and it does not behave as expected:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
tiny_df = pd.DataFrame({'x': ['a', 'b']})
class Foo(OrdinalEncoder):
def __init__(self, *args, **kwargs):
super().__init__(self, *args, **kwargs)
def fit(self, X, y=None):
super().fit(X, y)
return self
oe = OrdinalEncoder()
oe.fit(tiny_df) # works fine
foo = Foo()
foo.fit(tiny_df) # fails
The relevant part of the error message I receive is:
~\.conda\envs\pytorch\lib\site-packages\sklearn\preprocessing\_encoders.py in _fit(self, X, handle_unknown)
69 raise ValueError("Unsorted categories are not "
70 "supported for numerical categories")
---> 71 if len(self._categories) != n_features:
72 raise ValueError("Shape mismatch: if n_values is an array,"
73 " it has to be of shape (n_features,).")
TypeError: object of type 'Foo' has no len()
Somehow parent's private property _categories
does not seem to get set, even though I've called the parent constructor in the __init__()
method of my class. I must be missing something simple here, and would appreciate any help!
Upvotes: 1
Views: 314
Reputation: 4264
You don't have to pass self
again to the super
function. And scikit-learn
's estimators should always specify their parameters in the signature of their __init__
and no varargs
are allowed else you will get a RUNTIMEERROR
, so you have to remove it. I have modified your code as below:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
tiny_df = pd.DataFrame({'x': ['a', 'b']})
class Foo(OrdinalEncoder):
def __init__(self, **kwargs):
super().__init__(**kwargs)
def fit(self, X, y=None):
super().fit(X, y)
return self
oe = OrdinalEncoder()
oe.fit(tiny_df) # works fine
foo = Foo()
foo.fit(tiny_df) # works fine too
SAMPLE OUTPUT
foo.transform(tiny_df)
array([[0.],
[1.]])
A little extra
class Foo(OrdinalEncoder):
def __init__(self, *args, **kwargs):
super().__init__(*args,**kwargs)
def fit(self, X, y=None):
super().fit(X, y)
return self
And when you create Foo
:
foo= Foo()
RuntimeError: scikit-learn estimators should always specify their parameters in the signature of their __init__ (no varargs). <class '__main__.Foo'> with constructor (self, *args, **kwargs) doesn't follow this convention.
Upvotes: 3