Reputation: 13
I am doing digit classifier with MNIST dataset using MLP Classifier. I observe very strange behaviour. Single layer classifier is better than multilayer. While increasing number of neurons in single layer seems to be increasing accuracy. Why multilayer is not better than single layer? Here is my code:
param_grid={'hidden_layer_sizes':[400,300,200,100,70,50,20,10]}
grid=GridSearchCV(MLPClassifier(random_state=1),param_grid,cv=3,scoring='accuracy')
grid.fit(train_data.iloc[:,1:],train_data.iloc[:,0])
grid.grid_scores_
output:
[mean: 0.97590, std: 0.00111, params: {'hidden_layer_sizes': 400},
mean: 0.97300, std: 0.00300, params: {'hidden_layer_sizes': 300},
mean: 0.97271, std: 0.00065, params: {'hidden_layer_sizes': 200},
mean: 0.97052, std: 0.00143, params: {'hidden_layer_sizes': 100},
mean: 0.96507, std: 0.00262, params: {'hidden_layer_sizes': 70},
mean: 0.96448, std: 0.00150, params: {'hidden_layer_sizes': 50},
mean: 0.94531, std: 0.00378, params: {'hidden_layer_sizes': 20},
mean: 0.92945, std: 0.00320, params: {'hidden_layer_sizes': 10}]
For multilayer:
param_grid={'hidden_layer_sizes':[[200],[200,100],[200,100,50],[200,100,50,20],[200,100,50,20,10]]}
grid=GridSearchCV(MLPClassifier(random_state=1),param_grid,cv=3,scoring='accuracy')
grid.fit(train_data.iloc[:,1:],train_data.iloc[:,0])
grid.grid_scores_
Output:
[mean: 0.97271, std: 0.00065, params: {'hidden_layer_sizes': [200]},
mean: 0.97255, std: 0.00325, params: {'hidden_layer_sizes': [200, 100]},
mean: 0.97043, std: 0.00199, params: {'hidden_layer_sizes': [200, 100, 50]},
mean: 0.96755, std: 0.00173, params: {'hidden_layer_sizes': [200, 100, 50, 20]},
mean: 0.96086, std: 0.00511, params: {'hidden_layer_sizes': [200, 100, 50, 20, 10]}]
About dataset: 28*28 pixel images of handwritten digits.
Upvotes: 1
Views: 644
Reputation: 210932
It seems to me that your model is overfitting. You can check that by comparing train_scores (use parameter return_train_score=True
) with test_scores.
If it's already overfitting, then making your NN deeper or increasing units in hidden layers may make it worse. So try to get more data and/or to find a proper alpha
(regularization parameter) to make your model perform better.
Upvotes: 3
Reputation: 685
I only can give a theoretical answer:
There's a theorem called "Universal approximation theorem " which commonly says: "Any ANN and MLP can be represented with a MLP with only 1 hidden layer."
So, this could be your problem. Your MLP with 1 layer has the right params for this problem, while in the other hand, you haven't find yet the right ones for the multilayer.
EDIT: More layers doesn't mean better performance. In ANN sometimes bigger it's not better. (xD)
Upvotes: 1