Reputation: 193
I have a question about my NN model. I am using keras from python. My training consists of 1000 samples, each with 4320 features. There are 10 categories, and my Y contains numpy arrays of 10 elements with 0 on all the positions except one.
However, my NN doesn't learn from the first epoch and I probably have my model wrong, it's my first attempt of building a NN model and I must have got wrong a couple of things.
Epoch 1/150
1000/1000 [==============================] - 40s 40ms/step - loss: 6.7110 - acc: 0.5796
Epoch 2/150
1000/1000 [==============================] - 39s 39ms/step - loss: 6.7063 - acc: 0.5800
Epoch 3/150
1000/1000 [==============================] - 38s 38ms/step - loss: 6.7063 - acc: 0.5800
Epoch 4/150
1000/1000 [==============================] - 39s 39ms/step - loss: 6.7063 - acc: 0.5800
Epoch 5/150
1000/1000 [==============================] - 38s 38ms/step - loss: 6.7063 - acc: 0.5800
Epoch 6/150
1000/1000 [==============================] - 38s 38ms/step - loss: 6.7063 - acc: 0.5800
Epoch 7/150
1000/1000 [==============================] - 40s 40ms/step - loss: 6.7063 - acc: 0.5800
Epoch 8/150
1000/1000 [==============================] - 39s 39ms/step - loss: 6.7063 - acc: 0.5800
Epoch 9/150
1000/1000 [==============================] - 40s 40ms/step - loss: 6.7063 - acc: 0.5800
And this is part of my NN code:
model = Sequential()
model.add(Dense(4320, input_dim=4320, activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(10, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10)
So, my X is a numpy array of length 1000 that contains other numpy arrays of 4320 elements. My Y is a numpy array of length 1000 that contains other numpy arrays of 10 elements (categories).
Am I doing something wrong or it just can't learn based on this training set? (On 1NN with manhattan distance I'm getting ~80% accuracy on this training set)
L.E.: After I've normalized the data, this is the output of my first 10 epochs:
Epoch 1/150
1000/1000 [==============================] - 41s 41ms/step - loss: 7.9834 - acc: 0.4360
Epoch 2/150
1000/1000 [==============================] - 41s 41ms/step - loss: 7.2943 - acc: 0.5080
Epoch 3/150
1000/1000 [==============================] - 39s 39ms/step - loss: 9.0326 - acc: 0.4070
Epoch 4/150
1000/1000 [==============================] - 39s 39ms/step - loss: 8.7106 - acc: 0.4320
Epoch 5/150
1000/1000 [==============================] - 40s 40ms/step - loss: 7.7547 - acc: 0.4900
Epoch 6/150
1000/1000 [==============================] - 44s 44ms/step - loss: 7.2591 - acc: 0.5270
Epoch 7/150
1000/1000 [==============================] - 42s 42ms/step - loss: 8.5002 - acc: 0.4560
Epoch 8/150
1000/1000 [==============================] - 41s 41ms/step - loss: 9.9525 - acc: 0.3720
Epoch 9/150
1000/1000 [==============================] - 40s 40ms/step - loss: 9.7160 - acc: 0.3920
Epoch 10/150
1000/1000 [==============================] - 39s 39ms/step - loss: 9.3523 - acc: 0.4140
Looks like it starts fluctuating so that seems to be good
Upvotes: 0
Views: 71
Reputation: 11225
It seems like your categories, classes are mutually exclusive since your target arrays are one-hot encoded (ie you never have to predict 2 classes at the same time). In that case, you should use softmax
on your last layer to produce a distribution and train using categorical_crossentropy
. If fact you can just set your targets as Y = [2,4,0,1]
as your category indices and train with sparse_categorical_crossentropy
which will save you the time of creating a 2 array of shape (samples, 10).
It seems like you have a lot of features, most likely the performance of your network will depend on how you pre-process your data. For continuous inputs, it's wise to normalise it and for discrete input encode it as one-hot to help the learning.
Upvotes: 2