Reputation: 1005
I am using bank data to predict number of tickets on a daily basis. I am using stacking to get more accurate result and using brew library.
Here is the sample dataset for important features:
[]
Here is the target attribute sample:
[]
Here is the code:
from stacked_generalization.lib.stacking import StackedClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression, RidgeClassifier
# Stage 1 model
bclf = LogisticRegression(random_state=1)
# Stage 0 models
clfs = [RandomForestClassifier(n_estimators=40, criterion = 'gini', random_state=1),
gbm,
RidgeClassifier(random_state=1)]
sl = StackedClassifier(bclf, clfs)
sl.fit(training.select_columns(features).to_dataframe().as_matrix(), np.array(training['class']))
Here is the training data format:
[[ 21 11 2014 46 4 3]
[ 22 11 2014 46 5 4]
[ 24 11 2014 47 0 4]
...,
[ 30 9 2016 39 4 5]
[ 3 10 2016 40 0 1]
[ 4 10 2016 40 1 1]]
Now, when I try to fit the model, it gives the following error:
However, I compared my code with the example given in the library but still couldn't figure out where am I going wrong. Kindly assist me.
Upvotes: 2
Views: 183
Reputation: 1609
I had a similar issue and seems to just be a bug in brew that needs to be fixed. The problem is that the c.classes_ (or number of classes) returns a numpy array with floats (e.g., if you have two classes it returns [0.0, 1.0] instead of integers ([0,1]). The code tries to use these floats to index the columns, but you cannot index a numpy column with floats.
probas.shape = # rows = # training examples; # columns = # of classes
c.predict_proba(X)
returns probabilites for each class for each training example.
probas[:, list(c.classes_)] = c.predict_proba(X)
Should put the probability for each class for each row in X into probas using class # to index columns in probas.
This would work if you add astype(int)
probas[:, list(et.classes_.astype(int))] = et.predict_proba(X)
or just
probas = np.copy(et.predict_proba(X))
Upvotes: 1