Reputation: 33
I was trying to implement linear regression in Keras/TensorFlow and was very surprised how difficult it is. The standard examples work great on random data. However, if we change the input data a little bit, all examples stop work correctly.
I try to find coefficients for y = 0.5 * x1 + 0.5 * x2
.
np.random.seed(1443)
n = 100000
x = np.zeros((n, 2))
y = np.zeros((n, 1))
x[:,0] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ))
x[:,1] = sorted(preprocessing.scale( np.random.poisson(1000000, (n)) ) )
y = (x[:,0] + x[:,1]) /2
model = keras.Sequential()
model.add( keras.layers.Dense(1, input_shape =(2,), dtype="float32" ))
model.compile(loss='mean_squared_error', optimizer='sgd')
model.fit(x,y, epochs=1000, batch_size=64)
print(model.get_weights())
The results:
| epochs| batch_size | bias | x1 | x2
| ------+------------+------------+------------+-----------
| 1000 | 64 | -5.83E-05 | 0.90410435 | 0.09594361
| 1000 | 1024 | -5.71E-06 | 0.98739249 | 0.01258729
| 1000 | 10000 | -3.07E-07 | -0.2441376 | 1.2441349
My first thought was that it is a bug in Keras. So, I tried R/Tensorflow library:
floatType <- "float32"
p <- 2L
X <- tf$placeholder(floatType, shape = shape(NULL, p), name = "x-data")
Y <- tf$placeholder(floatType, name = "y-data")
W <- tf$Variable(tf$zeros(list(p, 1L), dtype=floatType))
b <- tf$Variable(tf$zeros(list(1L), dtype=floatType))
Y_hat <- tf$add(tf$matmul(X, W), b)
cost <- tf$reduce_mean(tf$square(Y_hat - Y))
generator <- tf$train$GradientDescentOptimizer(learning_rate=0.01)
optimizer <- generator$minimize(cost)
session <- tf$Session()
session$run(tf$global_variables_initializer())
set.seed(1443)
n <- 10^5
x <- matrix( replicate(p, sort(scale((rpois(n, 10^6))))) , nrow = n )
y <- matrix((x[,1]+x[,2])/2)
i <- 1
batch_size <- 10000
epoch_number <- 1000
iterationNumber <- n*epoch_number / batch_size
while (iterationNumber > 0) {
feed_dict <- dict(X = x[i:(i+batch_size-1),, drop = F], Y = y[i:(i+batch_size-1),, drop = F])
session$run(optimizer, feed_dict = feed_dict)
i <- i+batch_size
if( i > n-batch_size)
i <- i %% batch_size
iterationNumber <- iterationNumber - 1
}
r_model <- lm(y ~ x)
tf_coef <- c(session$run(b), session$run(W))
r_coef <- r_model$coefficients
print(rbind(tf_coef, r_coef))
The results:
| epochs| batch_size | bias | x1 | x2
| ------+------------+------------+------------+-----------
|2000 | 64 | -1.33E-06 | 0.500307 | 0.4996932
|1000 | 1000 | 2.79E-08 | 0.5000809 | 0.499919
|1000 | 10000 | -4.33E-07 | 0.5004921 | 0.499507
|1000 | 100000 | 2.96E-18 | 0.5 | 0.5
Tensorflow finds the correct result only when batch size = samples number and the optimization algorithm is SGD. If optimization algorithm was "adam" or "adagrad", errors were much larger.
batch_size = n
. Could you recommend any approaches to solve this problem with precision 1E-07 for Keras or TensorFlow?Comment 1. Based on post "today" below: Train dataset shuffling will significantly improve the performance of TensorFlow version:
shuffledIndex<-sample(1:(nrow(x)))
x <- x[shuffledIndex,]
y <- y[shuffledIndex,,drop=FALSE]
For batch size = 2000:
|(Intercept) | x1 | x2
|----------------+-----------+----------
|-1.130693e-09 | 0.5000004 | 0.4999989
Upvotes: 3
Views: 645
Reputation: 33450
The problem is that you are sorting the generated random numbers for each feature value. So they end up very close to each other:
>>> np.mean(np.abs(x[:,0]-x[:,1]))
0.004125721684553685
As a result we would have:
y = (x1 + x2) / 2
~= (x1 + x1) / 2
= x1
= 0.5 * x1 + 0.5 * x1
= 0.3 * x1 + 0.7 * x1
= -0.3 * x1 + 1.3 * x1
= 10.1 * x1 - 9.1 * x1
= thousands of other possible combinations
In this case the solution that Keras would converge to would really depend on the initial value of the weights and bias of Dense layer. With different initial values you would get different results (and possibly for some of them, it may not converge at all):
# set the initial weight of Dense layer
model.layers[0].set_weights([np.array([[0], [1]]), np.array([0])])
# fit the model ...
# the final weights
model.get_weights()
[array([[0.00203656],
[0.9981099 ]], dtype=float32),
array([4.5520876e-05], dtype=float32)] # because: y = 0 * x1 + 1 * x1 = x1 ~= (x1 + x2) / 2
# again set the weights to something different
model.layers[0].set_weights([np.array([[0], [0]]), np.array([1])])
# fit the model...
# the final weights
model.get_weights()
[array([[0.49986306],
[0.50013727]], dtype=float32),
array([1.4176634e-08], dtype=float32)] # the one you were looking for!
However, if you don't sort the features (i.e. just remove sorted
) it is very likely that the converged weights to be very close to [0.5, 0.5]
.
Upvotes: 3