Reputation: 99
I'm trying to define a loss function and experiencing difficulties with that. Maybe someone can help me.
I have N data points for x_i
and y_i
and I want to fit a straight line (for simplicity) under the following condition:
i.e. find minimal value of h so that for all points |y_i - f(x_i)| < h. This condition does not refer to tf.losses.mean_squared_error or to LAD (least absolute deviation), where we minimize the sum of the absolute values.
tf_x = tf.placeholder(tf.float32, x.shape) # input x
tf_y = tf.placeholder(tf.float32, y.shape) # input y
l1 = tf.layers.dense(tf_x, 1) # assume linear activation
output = tf.layers.dense(l1, 1) # output layer
h = ???
loss = ???
optimizer = tf.train.train.AdamOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(loss)
So sess.run()
should return the predicted line and h value which satisfies the above-mentioned condition.
Thanks!
Upvotes: 1
Views: 413
Reputation: 2742
It sounds like you are using Tensorflow 1.x API since you mentioned using tf.placeholder
and sess.run
, so I have provided the solution using the Tensorflow 1.x API from Tensorflow 2.x. If you want to run in Tensorflow 1.x, just remove compat.v1
.
tf_x = tf.compat.v1.placeholder(tf.float32, [None, 1], name='x') # input x
tf_y = tf.compat.v1.placeholder(tf.float32, [None, 1], name='y') # input y
h = tf.Variable(0.0, name='h')
l1 = tf.compat.v1.layers.dense(tf_x, 1, name='layer_1') # assume linear activation
output = tf.compat.v1.layers.dense(l1, 1, name='output') # output layer
loss = tf.reduce_max(tf.abs(tf_y - output)) + tf.abs((h - tf.reduce_max(tf.abs(tf_y - output))))
optimizer = tf.compat.v1.train.GradientDescentOptimizer(learning_rate=0.1).minimize(loss)
init = tf.compat.v1.global_variables_initializer()
variables = tf.compat.v1.trainable_variables()
x = np.expand_dims(np.array([5.0, 5.0], dtype=np.float32), axis=-1)
y = np.expand_dims(np.array([2.0, 3.0], dtype=np.float32), axis=-1)
with tf.compat.v1.Session() as sess:
sess.run(init)
for step in range(1000):
_, val = sess.run([optimizer, loss],
feed_dict={tf_x: x, tf_y: y})
prediction = sess.run(output, feed_dict={'x:0': x})
print(prediction)
if step % 5 == 0:
print("step: {}, loss: {}".format(step, val))
print([{variable.name: sess.run(variable)} for variable in variables])
I have included some print statements to assist with observing the training process. The loss function is a bit weird looking because of the problem statement - we're learning both the function f(x)
which approximates y
and the residual error h
. I used dummy inputs to verify the functionality of the model - by providing two 5's with an output of 2 and 3, the model is forced to compromise and converge around predicting 2.5. From the last steps:
step: 990, loss: 0.6000000238418579
[{'h:0': 0.5}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.6000004]
[2.6000004]]
[{'h:0': 0.6}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.6000004], dtype=float32)}]
[[2.4000003]
[2.4000003]]
[{'h:0': 0.70000005}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.4000003]
[2.4000003]]
[{'h:0': 0.6}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.4000003]
[2.4000003]]
[{'h:0': 0.5}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.6000004]
[2.6000004]]
step: 995, loss: 0.6999993324279785
[{'h:0': 0.6}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.6000004], dtype=float32)}]
[[2.4000003]
[2.4000003]]
[{'h:0': 0.70000005}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.4000003]
[2.4000003]]
[{'h:0': 0.6}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.4000003]
[2.4000003]]
[{'h:0': 0.5}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.6000004]
[2.6000004]]
[{'h:0': 0.6}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.6000004], dtype=float32)}]
Notice the model predicts 2.4-2.6 for the inputs and for h
, the estimate is between 0.5-0.7, which is close to the actual residual errors (0.4-0.6). The behavior may change with real data - specifically, with real data there may not be duplicate inputs with different outputs, which is confusing for a model. To sanity check, we can run again with the same outputs, but change the input to 7:
step: 995, loss: 1.9000002145767212
[{'h:0': 1.8000002}, {'layer_1/kernel:0': array([[0.60248166]], dtype=float32)}, {'layer_1/bias:0': array([0.21199825], dtype=float32)}, {'output/kernel:0': array([[1.0599916]], dtype=float32)}, {'output/bias:0': array([0.2], dtype=float32)}]
[[-0.767429 ]
[-1.0744007]]
[{'h:0': 1.9000002}, {'layer_1/kernel:0': array([[-0.88150656]], dtype=float32)}, {'layer_1/bias:0': array([-6.8724134e-08], dtype=float32)}, {'output/kernel:0': array([[0.1741176]], dtype=float32)}, {'output/bias:0': array([0.], dtype=float32)}]
[[3.543093]
[4.895095]]
[{'h:0': 2.0000002}, {'layer_1/kernel:0': array([[-0.6377419]], dtype=float32)}, {'layer_1/bias:0': array([0.03482345], dtype=float32)}, {'output/kernel:0': array([[-1.0599916]], dtype=float32)}, {'output/bias:0': array([0.2], dtype=float32)}]
[[3.543093]
[4.895095]]
[{'h:0': 1.9000002}, {'layer_1/kernel:0': array([[-0.6377419]], dtype=float32)}, {'layer_1/bias:0': array([0.03482345], dtype=float32)}, {'output/kernel:0': array([[-1.0599916]], dtype=float32)}, {'output/bias:0': array([0.2], dtype=float32)}]
[[3.543093]
[4.895095]]
[{'h:0': 1.8000002}, {'layer_1/kernel:0': array([[-0.6377419]], dtype=float32)}, {'layer_1/bias:0': array([0.03482345], dtype=float32)}, {'output/kernel:0': array([[-1.0599916]], dtype=float32)}, {'output/bias:0': array([0.2], dtype=float32)}]
It's fairly accurate, as the residual error is about 2.1 (7 - 4.89) and h
is output as 1.8.
It's worth noting some additional pieces may be required for this loss function - for example, bounding output
since it's linear and can go to infinity (which the model may do to minimize the loss - tf.reduce_max(tf.abs(tf_y - output))
means that output
being infinity results in a negative infinity loss) - but this should be a starting point.
Upvotes: 1
Reputation: 1941
You are looking for the L-inf norm of the delta between y_true and y_pred for each datapoint. The L-inf norm only calculates loss from the maximum diverging data point. If you could optimize for that, you'd find the minimum h.
Of course, L-inf is not differentiable since it's just a mathematical way to express "max". So you can approximate it with an Ln norm where n is large. You can grid search for an n that stays numerically stable and try other tricks like gradient clipping.
Also, I suspect that if you approximate L-inf using a schedule of losses, first L2, you can gradually increase the n to L3, L4, L5, etc., to help the training process.
Upvotes: 0
Reputation: 20302
Not sure if this helps, but there is a scipy.optimize package, which provides several commonly used optimization algorithms. Here is a link to the documentation.
https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html
I've been using this a lot recently, and the results are fantastic!
Upvotes: 0