Bobby
Bobby

Reputation: 217

(Single Layer) Perceptron in PyTorch, bad convergence

I'm trying to develop a simple single layer perceptron with PyTorch (v0.4.0) to classify AND boolean operation. I want to develop it by using autograd to calculate gradient of weights and bias and then update them in a SGD manner.

The code is very simple and is the following:

# AND points and labels
data = torch.tensor([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
    ], dtype=torch.float32)
labels = torch.tensor([0,0,0,1], dtype=torch.float32)

weights = torch.zeros(2, dtype=torch.float32, requires_grad=True)
bias = torch.zeros(1, requires_grad=True)
losses = []
epochs = 100
eta = 0.01
for epoch in range(epochs):
    total_loss = 0
    for idx in range(4):
        # take current input
        X = data[idx,:]
        y = labels[idx]

        # compute output and loss
        out = torch.add(torch.dot(weights, X), bias)
        loss = (out-y).pow(2)
        total_loss += loss.item()
        # backpropagation
        loss.backward()

        # compute accuracy and update parameters
        with torch.no_grad():
            weights -= eta * weights.grad
            bias -= eta * bias.grad
            # reset gradient to zero
            weights.grad.zero_()
            bias.grad.zero_()
    losses.append(total_loss)

The model converges, as you can see from the learning curve Loss over epochs but the resulting plane is: in orange the plane, the top-right plane has label 1, the others 0

with 50% of accuracy.

I tried with different inital parameters and also by using the SGD optimizer from PyTorch but nothing changed. I know that MSE is a regression loss but I don't think the problem is there.

Any ideas?

Update The plane is computed with these 2 lines of code

xr = np.linspace(0, 1, 10)
yr = (-1 / weights[1].item()) * (weights[0].item() * xr  + bias.item())
plt.plot(xr,yr,'-')

Upvotes: 1

Views: 1293

Answers (2)

iacolippo
iacolippo

Reputation: 4513

The equation you use to compute the plane

yr = (-1 / weights[1].item()) * (weights[0].item() * xr  + bias.item())

is derived in the case where y_i = [+1, -1] and there is a sign function: it's computed by looking for the plane that separates positive and negative examples. This assumption is not valid anymore if you change targets.

If you draw this:

x1 = np.linspace(0, 1, 10)
x2 = np.linspace(0, 1, 10)
X, Y = np.meshgrid(x1, x2)
w1, w2 = weights.detach().numpy()[0, 0], weights.detach().numpy()[1, 0]
b = bias.detach().numpy()[0]
Z = w1*X + w2*Y + b

which is the correct plane in 3D, you get the correct separationplane in 3D space separating examples of different classes

You can get a correct separation with your formula if you offset by a factor that depends on the average of the labels, like:

yr = (-1 / weights[1].item()) * (weights[0].item() * xr  + bias.item() - 0.5)

but I can't come around at justifying it formally.

Upvotes: 1

Bobby
Bobby

Reputation: 217

I managed to solve the problem in 2 different ways:

Method 1 - Change labels in -1 and 1
By simply changing labels from (0, 1) to (-1, 1) the plane is computed correctly.

Hence, the new labels (same data) are:

labels = torch.tensor([-1,-1,-1,1], dtype=torch.float32)

Method 2 - Add a sigmoid function after out
With (0, 1) labels, add a sigmoid function just after computing out, in this way:

out = torch.add(torch.dot(weights, X), bias)
out = torch.sigmoid(out)

I think that method 1 accounts for the sign function of the perceptron, as the plan must discriminate points based on the sign of the output.
The method 2 adapts this reasoning for (0,1) labels by using a squashing function.
These are just tentative, partial explanations. Try to comment below with more accurate ones.

Upvotes: 0

Related Questions