user1584253
user1584253

Reputation: 1005

python - Stacked Classifier: IndexError while fitting the data

I am using bank data to predict number of tickets on a daily basis. I am using stacking to get more accurate result and using brew library.

Here is the sample dataset for important features:

[enter image description here] Here is the target attribute sample:

[enter image description here]

Here is the code:

from stacked_generalization.lib.stacking import StackedClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression, RidgeClassifier
# Stage 1 model
bclf = LogisticRegression(random_state=1)

# Stage 0 models
clfs = [RandomForestClassifier(n_estimators=40, criterion = 'gini', random_state=1),
        gbm,
        RidgeClassifier(random_state=1)]

sl = StackedClassifier(bclf, clfs)
sl.fit(training.select_columns(features).to_dataframe().as_matrix(), np.array(training['class']))

Here is the training data format:

[[  21   11 2014   46    4    3]
 [  22   11 2014   46    5    4]
 [  24   11 2014   47    0    4]
 ..., 
 [  30    9 2016   39    4    5]
 [   3   10 2016   40    0    1]
 [   4   10 2016   40    1    1]]

Now, when I try to fit the model, it gives the following error: enter image description here

However, I compared my code with the example given in the library but still couldn't figure out where am I going wrong. Kindly assist me.

Upvotes: 2

Views: 183

Answers (1)

ansonw
ansonw

Reputation: 1609

I had a similar issue and seems to just be a bug in brew that needs to be fixed. The problem is that the c.classes_ (or number of classes) returns a numpy array with floats (e.g., if you have two classes it returns [0.0, 1.0] instead of integers ([0,1]). The code tries to use these floats to index the columns, but you cannot index a numpy column with floats.

probas.shape = # rows = # training examples; # columns = # of classes

c.predict_proba(X) returns probabilites for each class for each training example.

probas[:, list(c.classes_)] = c.predict_proba(X)

Should put the probability for each class for each row in X into probas using class # to index columns in probas.

This would work if you add astype(int)

probas[:, list(et.classes_.astype(int))] = et.predict_proba(X)

or just

probas = np.copy(et.predict_proba(X))

Upvotes: 1

Related Questions