daniel sp
daniel sp

Reputation: 1000

MultiOutputClassifier only returns learned data

As the title says, I am testing python MultiOutputClassifier, to fix a problem that requires determining coordinates (x,y) as output, given 3 inputs and it only returns as prediction the closest learned value, not the 'extrapolated' one.

My sample is code is as follows:

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.multioutput import MultiOutputClassifier

train_data = np.array([
[-30,-60,-90,0,0],
[-50,-50,-50,10,0],
[-90,-60,-30,20,0],
[-50,-50,-95,0,10],
[-60,-30,-60,10,10],
[-95,-50,-50,20,10],
])
# These I just made up
test_data_x = np.array([
  [-35,-50,-90],
])

x = train_data[:, :3]
y = train_data[:, 3:]
forest = RandomForestClassifier(n_estimators=100, random_state=1)
classifier = MultiOutputClassifier(forest, n_jobs=-1)
classifier.fit(x,y)
print classifier.predict(test_data_x)

This returns 0,10, but I would expect that for the given inputs the output should be something like 5,5; somewhere between two of the learned values.

I see that there is something I am doing wrong or I misunderstood. Any help with this issue? Is it that the MultiOutputClassifier is not the right thing?

Upvotes: 0

Views: 76

Answers (1)

renatoc
renatoc

Reputation: 323

The problem here is that a (Random Forest) Classifier won't extrapolate. It can only output values it has already seen. You probably want to use a regressor.

Replacing "Classifier" with "Regressor" in your code yields (0.8, 5.8) as output, which seems closer to what you expected.

import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.multioutput import MultiOutputRegressor

train_data = np.array([
    [-30,-60,-90,0,0],
    [-50,-50,-50,10,0],
    [-90,-60,-30,20,0],
    [-50,-50,-95,0,10],
    [-60,-30,-60,10,10],
    [-95,-50,-50,20,10],
])

test_data_x = np.array([
    [-35,-50,-90],
])

x = train_data[:, :3]
y = train_data[:, 3:]
forest = RandomForestRegressor(n_estimators=100, random_state=1)
classifier = MultiOutputRegressor(forest, n_jobs=-1)
classifier.fit(x,y)
print(classifier.predict(test_data_x))

Upvotes: 1

Related Questions