Hung Dang
Hung Dang

Reputation: 53

Why does keras neural network predicts the same number for all different images?

I'm trying to use keras neural network of tensorflow to recognize the handwriting digit number. But idk why when i call predict(), it returns same results for all of input images.

Here is code:

  ### Train dataset ###
  mnist = tf.keras.datasets.mnist
  (x_train, y_train), (x_test, y_test) = mnist.load_data()
  x_train = x_train/255
  x_test = x_test/255

  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape=(28,28)))
  model.add(tf.keras.layers.Dense(units=128,activation=tf.nn.relu))
  model.add(tf.keras.layers.Dense(units=10,activation=tf.nn.softmax))

  model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])

  model.fit(x_train, y_train, epochs=5)

The result looks like this:

Epoch 1/5
1875/1875 [==============================] - 2s 672us/step - loss: 0.2620 - accuracy: 0.9248
Epoch 2/5
1875/1875 [==============================] - 1s 567us/step - loss: 0.1148 - accuracy: 0.9658
Epoch 3/5
1875/1875 [==============================] - 1s 559us/step - loss: 0.0784 - accuracy: 0.9764
Epoch 4/5
1875/1875 [==============================] - 1s 564us/step - loss: 0.0596 - accuracy: 0.9817
Epoch 5/5
1875/1875 [==============================] - 1s 567us/step - loss: 0.0462 - accuracy: 0.9859

Then the code to use image to test is below:

  img = cv.imread('path/to/1.png')
  img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
  img = cv.resize(img,(28,28))
  img = np.array([img])
    
  if cv.countNonZero((255-image)) == 0:
     print('')
  img = np.invert(img)
    
  plt.imshow(img[0])
  plt.show()
    
  prediction = model.predict(img)
  result = np.argmax(prediction)
  print(prediction)
  print(f'Result: {result}')

The result is:

Input with number 1

plt show: PlT show 1

[[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
Result: 3

Input with number 2

plt show PlT show 2

[[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
Result: 3

Upvotes: 1

Views: 321

Answers (1)

Innat
Innat

Reputation: 17239

Normalize your data in inference time same what you did on the training set

img = np.array([img]) / 255

Check this answer (Inference) for more details.


Based on your 3rd comment, here are some details.

def input_prepare(img):            
    img = cv2.resize(img, (28, 28))   
    img = cv2.bitwise_not(img)   

    img = tf.cast(tf.divide(img, 255) , tf.float64)              
    img = tf.expand_dims(img, axis=0)   
    return img 

img = cv2.imread('/content/1.png')
orig = img.copy() # save for plotting later on 

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # gray scaling 
img = input_prepare(img)

plt.imshow(tf.reshape(img, shape=[28, 28]))

enter image description here

plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))
plt.title(np.argmax(model.predict(img)))
plt.show()

enter image description here

It works as expected. But because of resizing the image, the digits get broken and lose their spatial information. That seems ok for the model but if it gets much worse, then the model will predict wrong. A case examples

enter image description here

and the model predicts wrong for this.

plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))
plt.title(np.argmax(model.predict(img)))
plt.show()

enter image description here

To fix this we can apply cv2.erode to add some pixel after resizing, for example

def input_prepare(img):            
    img = cv2.resize(img, (28, 28))   
    img = cv2.erode(img, np.ones((2, 2)))
    img = cv2.bitwise_not(img)   

    img = tf.cast(tf.divide(img, 255) , tf.float64)              
    img = tf.expand_dims(img, axis=0)   
    return img 

enter image description here

Not the best approach perhaps but now the model will understand better.

plt.imshow(cv2.cvtColor(orig, cv2.COLOR_BGR2RGB))
plt.title(np.argmax(model.predict(img)))
plt.show()

enter image description here

Upvotes: 1

Related Questions