Reputation: 261
I'm using sklearn for random forest classification. Now i want to compare different descriptor sets (one with 125 features, one with 154 features). Therefore i'm creating two different random forests, but they seem to overwrite each other which then leads to the error: 'Number of features of the model must match the input. Model n_features is 125 and input n_features is 154'
rf_std = RandomForestClassifier(n_estimators = 150, max_depth = 200, max_features = 'sqrt')
rf_nostd = RandomForestClassifier(n_estimators = 150, max_depth = 200, max_features = 'sqrt')
rf_std=rf_std.fit(X_train_std,y_train_std)
print('Testing score std:',rf_std.score(X_test_std,y_test_std))
rf_nostd=rf_nostd.fit(X_train_nostd,y_train_nostd)
print('Testing score nostd:',rf_nostd.score(X_test_nostd,y_test_nostd))
# until here it works
fig, (ax1, ax2) = plt.subplots(1, 2)
disp = plot_confusion_matrix(rf_std, X_test_std, y_test_std,
cmap=plt.cm.Blues,
normalize='true',ax=ax1)
disp = plot_confusion_matrix(rf_nostd, X_test_nostd, y_test_nostd,
cmap=plt.cm.Blues,
normalize='true',ax=ax2)
plt.show()
#here i get the error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-eee9fea5dbfb> in <module>
3 disp = plot_confusion_matrix(rf_std, X_test_std, y_test_std,
4 cmap=plt.cm.Blues,
----> 5 normalize='true',ax=ax1)
6 disp = plot_confusion_matrix(rf_nostd, X_test_nostd, y_test_nostd,
7 cmap=plt.cm.Blues,
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\metrics\_plot\confusion_matrix.py in plot_confusion_matrix(estimator, X, y_true, labels, sample_weight, normalize, display_labels, include_values, xticks_rotation, values_format, cmap, ax)
183 raise ValueError("plot_confusion_matrix only supports classifiers")
184
--> 185 y_pred = estimator.predict(X)
186 cm = confusion_matrix(y_true, y_pred, sample_weight=sample_weight,
187 labels=labels, normalize=normalize)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict(self, X)
610 The predicted classes.
611 """
--> 612 proba = self.predict_proba(X)
613
614 if self.n_outputs_ == 1:
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in predict_proba(self, X)
654 check_is_fitted(self)
655 # Check data
--> 656 X = self._validate_X_predict(X)
657
658 # Assign chunk of trees to jobs
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\ensemble\_forest.py in _validate_X_predict(self, X)
410 check_is_fitted(self)
411
--> 412 return self.estimators_[0]._validate_X_predict(X, check_input=True)
413
414 @property
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\tree\_classes.py in _validate_X_predict(self, X, check_input)
389 "match the input. Model n_features is %s and "
390 "input n_features is %s "
--> 391 % (self.n_features_, n_features))
392
393 return X
ValueError: Number of features of the model must match the input. Model n_features is 125 and input n_features is 154
EDIT: Fitting the second randomforest somehow overwrites the first one like so:
rf_std=rf_std.fit(X_train_std,y_train_std)
print(rf_std.n_features_)
rf_nostd=rf_nostd.fit(X_train_nostd,y_train_nostd)
print(rf_std.n_features_)
Output:
154
125
why aren't the two models separate, can anyone help?
Upvotes: 2
Views: 371
Reputation: 199
This generally occurs when your train/test sets doesn't match with shape. Could you please check the shape info matches for the below ?
X_train_std.shape[1] == X_test_std.shape[1]
X_train_nostd.shape[1] == X_test_nostd.shape[1]
If it matches you are good with it, else you have to look in to the place where you find difference.
Regards,
MJ
Upvotes: 0
Reputation: 36594
I was able to reproduce this error with inconsistent train
and test
inputs shapes.
Try this:
assert X_train_std.shape[-1] == X_test_std.shape[-1], "Input shapes don't match."
assert X_train_nostd.shape[-1] == X_test_nostd.shape[-1], "Input shapes don't match."
This is how I reproduced your error:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
X_train_std = np.random.rand(400, 154)
X_test_std = np.random.rand(100, 125)
y_train_std = np.random.randint(0, 2, 400).tolist()
y_test_std = np.random.randint(0, 2, 100).tolist()
rf_std = RandomForestClassifier(n_estimators = 150,
max_depth = 200, max_features = 'sqrt')
rf_std=rf_std.fit(X_train_std,y_train_std)
print('Testing score std:',rf_std.score(X_test_std,y_test_std))
ValueError: Number of features of the model must match the input. Model n_features is 154 and input n_features is 125
Upvotes: 1