prog mob
prog mob

Reputation: 41

Basic binary classification with Keras not working

I am a newbie to ML, and want to perform the simpliest classification with Keras: if y > 0.5, then label = 1 (x no matter), and y < 0.5 then label = 0 (x no matter)

As far as I understand, 1 neuron with sigmoid activation can peform this linear classification.

import tensorflow.keras as keras
import math

import numpy as np
import matplotlib as mpl

train_data = np.empty((0,2),float)
train_labels = np.empty((0,1),float)


train_data = np.append(train_data, [[0, 0]], axis=0)
train_labels = np.append(train_labels, 0)

train_data = np.append(train_data, [[1, 0]], axis=0)
train_labels = np.append(train_labels, 0)

train_data = np.append(train_data, [[0, 1]], axis=0)
train_labels = np.append(train_labels, 1)

train_data = np.append(train_data, [[1, 1]], axis=0)
train_labels = np.append(train_labels, 1)


model = keras.models.Sequential()
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Dense(1, input_dim = 2, activation='sigmoid'))

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(train_data, train_labels, epochs=20)

Training:

Epoch 1/5
4/4 [==============================] - 1s 150ms/step - loss: 0.4885 - acc: 0.7500
Epoch 2/5
4/4 [==============================] - 0s 922us/step - loss: 0.4880 - acc: 0.7500
Epoch 3/5
4/4 [==============================] - 0s 435us/step - loss: 0.4875 - acc: 0.7500
Epoch 4/5
4/4 [==============================] - 0s 396us/step - loss: 0.4869 - acc: 0.7500
Epoch 5/5
4/4 [==============================] - 0s 465us/step - loss: 0.4863 - acc: 0.7500

And predicting is not good:

predict_data = np.empty((0,2),float)
predict_data = np.append(predict_data, [[0, 0]], axis=0)
predict_data = np.append(predict_data, [[1, 0]], axis=0)
predict_data = np.append(predict_data, [[1, 1]], axis=0)
predict_data = np.append(predict_data, [[1, 1]], axis=0)

predict_labels = model.predict(predict_data)
print(predict_labels)

[[0.49750862]
 [0.51616406]
 [0.774486  ]
 [0.774486  ]]

How to solve this problem?

After all, I tried to train model on 2000 points (in my mind, it's more than enough for this simple problem), but with no sucess...

train_data = np.empty((0,2),float)
train_labels = np.empty((0,1),float)

for i in range(0, 1000):
  train_data = np.append(train_data, [[i, 0]], axis=0)
  train_labels = np.append(train_labels, 0)
  train_data = np.append(train_data, [[i, 1]], axis=0)
  train_labels = np.append(train_labels, 1)

model = keras.models.Sequential()
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Dense(1, input_dim = 2, activation='sigmoid'))

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(train_data, train_labels, epochs=5)

Epoch 1/5
2000/2000 [==============================] - 1s 505us/step - loss: 7.9669 - acc: 0.5005
Epoch 2/5
2000/2000 [==============================] - 0s 44us/step - loss: 7.9598 - acc: 0.5010
Epoch 3/5
2000/2000 [==============================] - 0s 45us/step - loss: 7.9511 - acc: 0.5010
Epoch 4/5
2000/2000 [==============================] - 0s 50us/step - loss: 7.9408 - acc: 0.5010
Epoch 5/5
2000/2000 [==============================] - 0s 53us/step - loss: 7.9279 - acc: 0.5015

<tensorflow.python.keras.callbacks.History at 0x7f4bdbdbda90>

Prediction:

predict_data = np.empty((0,2),float)
predict_data = np.append(predict_data, [[0, 0]], axis=0)
predict_data = np.append(predict_data, [[1, 0]], axis=0)
predict_data = np.append(predict_data, [[1, 1]], axis=0)
predict_data = np.append(predict_data, [[1, 1]], axis=0)

predict_labels = model.predict(predict_data)
print(predict_labels)

[[0.6280617 ]
 [0.48020774]
 [0.8395983 ]
 [0.8395983 ]]

0.6280617 for (0,0) is very bad.

Upvotes: 1

Views: 2743

Answers (2)

sdcbr
sdcbr

Reputation: 7129

Your problem setup is a bit weird in the sense that you only have four data points yet want to learn model weights with gradient descent (or adam). Also, the batchnorm does not really make sense here, so I would suggest to remove it.

Apart from that, your network is predicting numbers between 0 and 1 ('probabilities') and not class labels. To get the predicted class labels, you can use model.predict_classes(predict_data) instead of model.predict().

If you are new to ML and you want to experiment with toy datasets, you can also have a look at scikit-learn, which is a library that implements more traditional ML algorithms, whereas Keras is specifically for deep learning. Consider for instance logistic regression, which is the same thing as a single neuron with a sigmoid activation but is implemented with different solvers in sklearn:

from sklearn.linear_model import LogisticRegression

model  = LogisticRegression()
model = model.fit(train_data, train_labels)
model.predict(predict_data)
> array([0., 0., 1., 1.])

The scikit-learn website contains lots of examples that illustrate these different algorithms on toy datasets.

In your second scenario, you are not allowing any variation in the second feature, which is the only one that matters. If you want to train the model on 1000 datapoints, you can generate data around the four points in your original dataset and add some random noise to those:

import keras
import numpy as np
import matplotlib.pyplot as plt

# Generate toy dataset
train_data = np.random.randint(0, 2, size=(1000, 2))
# Add gaussian noise
train_data = train_data + np.random.normal(scale=2e-1, size=train_data.shape)
train_labels = (train_data[:, 1] > 0.5).astype(int)

# Visualize the data, color-coded by their classes
fig, ax = plt.subplots()
ax.scatter(train_data[:, 0], train_data[:, 1], c=train_labels)

enter image description here

# Train a simple neural net
model = keras.models.Sequential()
model.add(keras.layers.Dense(1, input_shape= (2,), activation='sigmoid'))
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

history = model.fit(train_data, train_labels, epochs=20)

You can use the history object to visualize how the loss or accuracy evolved during training:

fig, ax = plt.subplots()
ax.plot(history.history['acc'])

enter image description here

Finally, test the model on some test data:

from sklearn.metrics import accuracy_score
# Test on test data
test_data = np.random.randint(0, 2, size=(100, 2))
# Add gaussion noise
test_data = test_data + np.random.normal(scale=2e-1, size=test_data.shape)
test_labels = (test_data[:, 1] > 0.5).astype(int)

accuracy_score(test_labels, model.predict_classes(test_data[:, 1]))

However, be aware that you could solve the entire problem by just using the second coordinate. So it works just fine if you throw the first one away:

# Use only second coordinate
model = keras.models.Sequential()
model.add(keras.layers.Dense(1, input_shape= (1,), activation='sigmoid'))
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

history = model.fit(train_data[:,1], train_labels, epochs=20)

This model quickly achieves high accuracy: enter image description here

Upvotes: 3

Daniel T.
Daniel T.

Reputation: 489

yes, first of all BatchNorm and Adam doesn't really make sense in this situation. And the reason why your predictions don't work is because your model is too weak to solve your equations. If you if try to it solve mathematically you will have:

sigmoid(w1*x1+w2+x2+b0) = y

So with your training data you get:

1) sigmoid(b0) = 0 => b0 = -infinite
2) sigmoid(w1+b0) = 0 => w1 = constant
3) sigmoid(w2+b0) = 1 => w2 >> |b0| (already starting to break...)
4) sigmoid(w1+w2+b0) = 1 => same as 3

So in my opinion the trainer will start to oscillate between 2 and 3, starting to increase each one higher than the other and you will never reach your prediction with this model

And if you look the the 75% accuracy it will make sense because you have 4 training example and as stated above one prediction will not be possible so you will get 3/4 acc

Upvotes: 1

Related Questions