Salamat
Salamat

Reputation: 13

Why single layer MLP is better than multilayer in digit classifier?

I am doing digit classifier with MNIST dataset using MLP Classifier. I observe very strange behaviour. Single layer classifier is better than multilayer. While increasing number of neurons in single layer seems to be increasing accuracy. Why multilayer is not better than single layer? Here is my code:

param_grid={'hidden_layer_sizes':[400,300,200,100,70,50,20,10]}
grid=GridSearchCV(MLPClassifier(random_state=1),param_grid,cv=3,scoring='accuracy')
grid.fit(train_data.iloc[:,1:],train_data.iloc[:,0])
grid.grid_scores_

output:

[mean: 0.97590, std: 0.00111, params: {'hidden_layer_sizes': 400},
 mean: 0.97300, std: 0.00300, params: {'hidden_layer_sizes': 300},
 mean: 0.97271, std: 0.00065, params: {'hidden_layer_sizes': 200},
 mean: 0.97052, std: 0.00143, params: {'hidden_layer_sizes': 100},
 mean: 0.96507, std: 0.00262, params: {'hidden_layer_sizes': 70},
 mean: 0.96448, std: 0.00150, params: {'hidden_layer_sizes': 50},
 mean: 0.94531, std: 0.00378, params: {'hidden_layer_sizes': 20},
 mean: 0.92945, std: 0.00320, params: {'hidden_layer_sizes': 10}]

For multilayer:

param_grid={'hidden_layer_sizes':[[200],[200,100],[200,100,50],[200,100,50,20],[200,100,50,20,10]]}
grid=GridSearchCV(MLPClassifier(random_state=1),param_grid,cv=3,scoring='accuracy')
grid.fit(train_data.iloc[:,1:],train_data.iloc[:,0])
grid.grid_scores_

Output:

[mean: 0.97271, std: 0.00065, params: {'hidden_layer_sizes': [200]},
 mean: 0.97255, std: 0.00325, params: {'hidden_layer_sizes': [200, 100]},
 mean: 0.97043, std: 0.00199, params: {'hidden_layer_sizes': [200, 100, 50]},
 mean: 0.96755, std: 0.00173, params: {'hidden_layer_sizes': [200, 100, 50, 20]},
 mean: 0.96086, std: 0.00511, params: {'hidden_layer_sizes': [200, 100, 50, 20, 10]}]

About dataset: 28*28 pixel images of handwritten digits.

Upvotes: 1

Views: 644

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210932

It seems to me that your model is overfitting. You can check that by comparing train_scores (use parameter return_train_score=True) with test_scores.

If it's already overfitting, then making your NN deeper or increasing units in hidden layers may make it worse. So try to get more data and/or to find a proper alpha (regularization parameter) to make your model perform better.

Upvotes: 3

Daniel Rodríguez
Daniel Rodríguez

Reputation: 685

I only can give a theoretical answer:

There's a theorem called "Universal approximation theorem " which commonly says: "Any ANN and MLP can be represented with a MLP with only 1 hidden layer."

So, this could be your problem. Your MLP with 1 layer has the right params for this problem, while in the other hand, you haven't find yet the right ones for the multilayer.

EDIT: More layers doesn't mean better performance. In ANN sometimes bigger it's not better. (xD)

Upvotes: 1

Related Questions