yobro97
yobro97

Reputation: 1145

Precision of a class turns out to be zero using MLP classifier

I have a data set of about 45000 samples each with binary output of either 0 or 1. But after using the MLP classifier in sklearn package, I obtained a model that always has an output of 1 no matter what the input is. The precision of class 0 is zero. I have tried changing the hyperparameters of the model but the output is the same. Can anyone suggest a way to get over it?

     precision    recall  f1-score   support                                                                                                                                                                                                                                                                                                        
0       0.00      0.00      0.00     19967                                                                                                                              
1       0.57      1.00      0.73     26688                                                                                                                                                                                                                                                                                            
avg / total       0.33      0.57      0.42     46655  

PS: My code

    loc = './new_attributes_66.csv'
data = pd.read_csv(loc)

scaler = MinMaxScaler(feature_range = (-1,1))
scaler.fit(data)
data = scaler.transform(data)
print data


input = data[:,0:64]
output = data[:,65]
X_tr, X_tst, y_tr, y_tst = train_test_split(input, output, test_size=0.1)

clf = MLPClassifier(solver='sgd', alpha=1e-5, hidden_layer_sizes=(40,121), random_state=0, warm_start = True, tol = 0.0000001, early_stopping = False, learning_rate='adaptive',learning_rate_init = 0.1, max_iter=10000,shuffle=True,verbose=True)

clf.fit(X_tr,y_tr)
predicted = clf.predict(input)
#print "Accuracy using MLP classifier: "
print metrics.precision_score(output, predicted)
#print confusion_matrix(y_tst,predicted)
print metrics.classification_report(output,predicted)
#print clf.coefs_

Link to the dataset (csv) : https://app.box.com/s/vfqgool2u9ovdc9oyi9elq99aor6c6gk

Update: I have modified my code and results according to the latest results. I could improve the precision and recall as:

         precision    recall  f1-score   support

   -1.0       0.53      0.10      0.17     19967
    1.0       0.58      0.93      0.72     26688

avg / total       0.56      0.58      0.48     46655

With an accuracy of 58.14 %. In what other ways can the hyperparametrs be varied?

Upvotes: 3

Views: 3773

Answers (2)

yobro97
yobro97

Reputation: 1145

Hey guys after a suggestion from Mohammed Kasif, I tried the AdaBoostClassifier on the data and scaled the data to -1,1 and obtained the following results:

Accuracy: 0.682432189042

         precision    recall  f1-score   support

   -1.0       0.59      0.56      0.57     19967
    1.0       0.68      0.71      0.70     26688

avg / total       0.64      0.65      0.64     46655

This is a large improvement compared to 57-58 % we were able to get on the MLPclassifier or even the AdaBoostclassifier without the scaling. Anyone with better results are free to post their ideas :)

Upvotes: 2

Gambit1614
Gambit1614

Reputation: 8811

Your data may be suffering from class imbalance problem. It might be the case that the number of sample with label 1 far outnumber those with label 0. There are various ways to tackle class imbalance problem :

You can also try checking out different values of alpha or different shapes of hidden layers. Maybe the current configuration that you are using is not able to learn properly.

Upvotes: 3

Related Questions