Reputation: 1145
I have a data set of about 45000
samples each with binary output of either 0
or 1
. But after using the MLP classifier in sklearn package, I obtained a model that always has an output of 1
no matter what the input is. The precision of class 0
is zero. I have tried changing the hyperparameters of the model but the output is the same. Can anyone suggest a way to get over it?
precision recall f1-score support
0 0.00 0.00 0.00 19967
1 0.57 1.00 0.73 26688
avg / total 0.33 0.57 0.42 46655
PS: My code
loc = './new_attributes_66.csv'
data = pd.read_csv(loc)
scaler = MinMaxScaler(feature_range = (-1,1))
scaler.fit(data)
data = scaler.transform(data)
print data
input = data[:,0:64]
output = data[:,65]
X_tr, X_tst, y_tr, y_tst = train_test_split(input, output, test_size=0.1)
clf = MLPClassifier(solver='sgd', alpha=1e-5, hidden_layer_sizes=(40,121), random_state=0, warm_start = True, tol = 0.0000001, early_stopping = False, learning_rate='adaptive',learning_rate_init = 0.1, max_iter=10000,shuffle=True,verbose=True)
clf.fit(X_tr,y_tr)
predicted = clf.predict(input)
#print "Accuracy using MLP classifier: "
print metrics.precision_score(output, predicted)
#print confusion_matrix(y_tst,predicted)
print metrics.classification_report(output,predicted)
#print clf.coefs_
Link to the dataset (csv) : https://app.box.com/s/vfqgool2u9ovdc9oyi9elq99aor6c6gk
Update: I have modified my code and results according to the latest results. I could improve the precision and recall as:
precision recall f1-score support
-1.0 0.53 0.10 0.17 19967
1.0 0.58 0.93 0.72 26688
avg / total 0.56 0.58 0.48 46655
With an accuracy of 58.14 %
. In what other ways can the hyperparametrs be varied?
Upvotes: 3
Views: 3773
Reputation: 1145
Hey guys after a suggestion from Mohammed Kasif, I tried the AdaBoostClassifier
on the data and scaled the data to -1,1
and obtained the following results:
Accuracy: 0.682432189042
precision recall f1-score support
-1.0 0.59 0.56 0.57 19967
1.0 0.68 0.71 0.70 26688
avg / total 0.64 0.65 0.64 46655
This is a large improvement compared to 57-58 %
we were able to get on the MLPclassifier
or even the AdaBoostclassifier
without the scaling. Anyone with better results are free to post their ideas :)
Upvotes: 2
Reputation: 8811
Your data may be suffering from class imbalance problem. It might be the case that the number of sample with label 1
far outnumber those with label 0
. There are various ways to tackle class imbalance problem :
You can also try checking out different values of alpha or different shapes of hidden layers. Maybe the current configuration that you are using is not able to learn properly.
Upvotes: 3