Luke Vincent
Luke Vincent

Reputation: 1323

Why does my tanh activation function perform so badly?

I have two Perceptron algorithms both identical except for the activation function. One using a single step function 1 if u >= 0 else -1 the other utilising the tanh function np.tanh(u).

I expected the tanh to outperform the step but it in fact performs terribly in comparison. Have I done something wrong here or is there a reason it's under-performing for the problem set?

enter image description here

import numpy as np
import matplotlib.pyplot as plt

# generate 20 two-dimensional training data
# data must be linearly separable

# C1: u = (0,0) / E = [1 0; 0 1]; C2: u = (4,0), E = [1 0; 0 1] where u, E represent centre & covariance matrix of the
# Gaussian distribution respectively


def step(u):
    return 1 if u >= 0 else -1


def sigmoid(u):
    return np.tanh(u)

c1mean = [0, 0]
c2mean = [4, 0]
c1cov = [[1, 0], [0, 1]]
c2cov = [[1, 0], [0, 1]]
x = np.ones((40, 3))
w = np.zeros(3)     # [0, 0, 0]
w2 = np.zeros(3)    # second set of weights to see how another classifier compares
t = []  # target array

# +1 for the first 20 then -1
for i in range(0, 40):
    if i < 20:
        t.append(1)
    else:
        t.append(-1)

x1, y1 = np.random.multivariate_normal(c1mean, c1cov, 20).T
x2, y2 = np.random.multivariate_normal(c2mean, c2cov, 20).T

# concatenate x1 & x2 within the first dimension of x and the same for y1 & y2 in the second dimension
for i in range(len(x)):
    if i >= 20:
        x[i, 0] = x2[(i-20)]
        x[i, 1] = y2[(i-20)]
    else:
        x[i, 0] = x1[i]
        x[i, 1] = y1[i]

errors = []
errors2 = []
lr = 0.0001
n = 10

for i in range(n):
    count = 0
    for row in x:
        dot = np.dot(w, row)
        response = step(dot)
        errors.append(t[count] - response)
        w += lr * (row * (t[count] - response))
        count += 1

for i in range(n):
    count = 0
    for row in x:
        dot = np.dot(w2, row)
        response = sigmoid(dot)
        errors2.append(t[count] - response)
        w2 += lr * (row * (t[count] - response))
        count += 1

print(errors[-1], errors2[-1])

# distribution
plt.figure(1)
plt.plot((-(w[2]/w[0]), 0), (0, -(w[2]/w[1])))
plt.plot(x1, y1, 'x')
plt.plot(x2, y2, 'ro')
plt.axis('equal')
plt.title('Heaviside')

# training error
plt.figure(2)
plt.ylabel('error')
plt.xlabel('iterations')
plt.plot(errors)
plt.title('Heaviside Error')

plt.figure(3)
plt.plot((-(w2[2]/w2[0]), 0), (0, -(w2[2]/w2[1])))
plt.plot(x1, y1, 'x')
plt.plot(x2, y2, 'ro')
plt.axis('equal')
plt.title('Sigmoidal')

plt.figure(4)
plt.ylabel('error')
plt.xlabel('iterations')
plt.plot(errors2)
plt.title('Sigmoidal Error')

plt.show()

Edit: Even from the error plots I've displayed the tanh function shows some convergence so it's reasonable to assume just increasing the iterations or reducing the learning rate would allow it reduce its error. However I guess I'm really asking, bearing in mind the significantly better performance from the step function, for what problem set is it ever viable to use tanh with a Perceptron?

Upvotes: 3

Views: 2374

Answers (1)

Cleb
Cleb

Reputation: 25997

As already mentioned in the comments, your learning rate is too small so it will take tons of iterations to converge. In order to get a comparable output, you can therefore increase n and/or lr.

If one increases lr to e.g. 0.1 (also 1 works fine) and n to 10000, the results look pretty much the same (see plots below) and the line

print(errors[-1], errors2[-1])

returns

(0, -8.4289020207961585e-11)

If you run it again, these values might differ since there is no seed set for the random numbers.

Here are the plots I get for the values mentioned above:

enter image description here enter image description here

enter image description here enter image description here

Upvotes: 2

Related Questions