Jensen
Jensen

Reputation: 53

TypeError: fit() missing 1 required positional argument: 'y' (using sklearn - ExtraTreesRegressor)

Just trying out the Sklearn python library and I re-purposed some code I was using for Linear regression to fit a regression tree model as an example I saw (here's the example code):

def fit(self, X, y):
        """
        Fit a Random Forest model to data `X` and targets `y`.

        Parameters
        ----------
        X : array-like
            Input values.
        y: array-like
            Target values.
        """
        self.X = X
        self.y = y
        self.n = self.X.shape[0]
        self.model = ExtraTreesRegressor(**self.params)
        self.model.fit(X, y)

Here's the code I've written/repurposed

data = pd.read_csv("rmsearch.csv", sep=",")
data = data[["price", "type", "number_bedrooms"]]
predict = "price"

X = np.array(data.drop([predict], 1))
y = np.array(data[predict])
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.2)

etr = ensemble.ExtraTreesRegressor
etr.fit(x_train, y_train)
acc = etr.score(x_test, y_test)
print("Accuracy; ", acc)

and I am getting this error:

etr.fit(x_train, y_train)
TypeError: fit() missing 1 required positional argument: 'y'

I know fit() takes 'X', 'y', and 'sample_weight' as input. but, sample_weight defaults to none. the other examples haven't helped me much but it could also be that I'm fairly new to python and not able to spot a simple coding error.

fit() documentation:

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesRegressor.html#sklearn.ensemble.ExtraTreesRegressor.fit

Thanks for your help in advance.

Upvotes: 1

Views: 7625

Answers (1)

jkr
jkr

Reputation: 19310

The problem is here

etr = ensemble.ExtraTreesRegressor
etr.fit(x_train, y_train)

You need to instantiate ensemble.ExtraTreesRegressor before calling fit on it. Change this code to

etr = ensemble.ExtraTreesRegressor()
etr.fit(x_train, y_train)

You get the seemingly strange error that y is missing because .fit is an instance method, so the first argument to this function is actually self. When you call .fit on an instance, self is passed automatically. If you call .fit on the class (as opposed to the instance), you would have to supply self. So your code is equivalent to ensemble.ExtraTreesRegressor.fit(self=x_train, x=y_train).

For an example of the difference, please see the example below. The two forms are functionally equivalent, but you can see that the first form is clunky.

from sklearn import ensemble

# Synthetic data.
x = [[0]]
y = [1]

myinstance = ensemble.ExtraTreesRegressor()
ensemble.ExtraTreesRegressor.fit(myinstance, x, y)

etr = ensemble.ExtraTreesRegressor()
etr.fit(x, y)

Upvotes: 2

Related Questions