Reputation: 1000
As the title says, I am testing python MultiOutputClassifier, to fix a problem that requires determining coordinates (x,y) as output, given 3 inputs and it only returns as prediction the closest learned value, not the 'extrapolated' one.
My sample is code is as follows:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.multioutput import MultiOutputClassifier
train_data = np.array([
[-30,-60,-90,0,0],
[-50,-50,-50,10,0],
[-90,-60,-30,20,0],
[-50,-50,-95,0,10],
[-60,-30,-60,10,10],
[-95,-50,-50,20,10],
])
# These I just made up
test_data_x = np.array([
[-35,-50,-90],
])
x = train_data[:, :3]
y = train_data[:, 3:]
forest = RandomForestClassifier(n_estimators=100, random_state=1)
classifier = MultiOutputClassifier(forest, n_jobs=-1)
classifier.fit(x,y)
print classifier.predict(test_data_x)
This returns 0,10, but I would expect that for the given inputs the output should be something like 5,5; somewhere between two of the learned values.
I see that there is something I am doing wrong or I misunderstood. Any help with this issue? Is it that the MultiOutputClassifier is not the right thing?
Upvotes: 0
Views: 76
Reputation: 323
The problem here is that a (Random Forest) Classifier won't extrapolate. It can only output values it has already seen. You probably want to use a regressor.
Replacing "Classifier" with "Regressor" in your code yields (0.8, 5.8) as output, which seems closer to what you expected.
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.multioutput import MultiOutputRegressor
train_data = np.array([
[-30,-60,-90,0,0],
[-50,-50,-50,10,0],
[-90,-60,-30,20,0],
[-50,-50,-95,0,10],
[-60,-30,-60,10,10],
[-95,-50,-50,20,10],
])
test_data_x = np.array([
[-35,-50,-90],
])
x = train_data[:, :3]
y = train_data[:, 3:]
forest = RandomForestRegressor(n_estimators=100, random_state=1)
classifier = MultiOutputRegressor(forest, n_jobs=-1)
classifier.fit(x,y)
print(classifier.predict(test_data_x))
Upvotes: 1