How to define loss function in Tensorflow [Optimization problem]?

I'm trying to define a loss function and experiencing difficulties with that. Maybe someone can help me.

I have N data points for x_i and y_i and I want to fit a straight line (for simplicity) under the following condition:

i.e. find minimal value of h so that for all points |y_i - f(x_i)| < h. This condition does not refer to tf.losses.mean_squared_error or to LAD (least absolute deviation), where we minimize the sum of the absolute values.

tf_x = tf.placeholder(tf.float32, x.shape)     # input x
tf_y = tf.placeholder(tf.float32, y.shape)     # input y


l1 = tf.layers.dense(tf_x, 1)          # assume linear activation
output = tf.layers.dense(l1, 1)        # output layer

h = ???
loss = ???
optimizer = tf.train.train.AdamOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(loss)

So sess.run() should return the predicted line and h value which satisfies the above-mentioned condition.

Thanks!

Upvotes: 1

Answers (3)

danielcahall

Reputation: 2752

It sounds like you are using Tensorflow 1.x API since you mentioned using tf.placeholder and sess.run, so I have provided the solution using the Tensorflow 1.x API from Tensorflow 2.x. If you want to run in Tensorflow 1.x, just remove compat.v1.

    tf_x = tf.compat.v1.placeholder(tf.float32, [None, 1], name='x')  # input x
    tf_y = tf.compat.v1.placeholder(tf.float32, [None, 1], name='y')  # input y
    h = tf.Variable(0.0, name='h')

    l1 = tf.compat.v1.layers.dense(tf_x, 1, name='layer_1')  # assume linear activation
    output = tf.compat.v1.layers.dense(l1, 1, name='output')  # output layer
    loss = tf.reduce_max(tf.abs(tf_y - output)) + tf.abs((h - tf.reduce_max(tf.abs(tf_y - output))))
    optimizer = tf.compat.v1.train.GradientDescentOptimizer(learning_rate=0.1).minimize(loss)
    init = tf.compat.v1.global_variables_initializer()
    variables = tf.compat.v1.trainable_variables()

    x = np.expand_dims(np.array([5.0, 5.0], dtype=np.float32), axis=-1)
    y = np.expand_dims(np.array([2.0, 3.0], dtype=np.float32), axis=-1)

    with tf.compat.v1.Session() as sess:
        sess.run(init)

        for step in range(1000):
            _, val = sess.run([optimizer, loss],
                              feed_dict={tf_x: x, tf_y: y})
            prediction = sess.run(output, feed_dict={'x:0': x})
            print(prediction)
            if step % 5 == 0:
                print("step: {}, loss: {}".format(step, val))
            print([{variable.name: sess.run(variable)} for variable in variables])

I have included some print statements to assist with observing the training process. The loss function is a bit weird looking because of the problem statement - we're learning both the function f(x) which approximates y and the residual error h. I used dummy inputs to verify the functionality of the model - by providing two 5's with an output of 2 and 3, the model is forced to compromise and converge around predicting 2.5. From the last steps:

step: 990, loss: 0.6000000238418579
[{'h:0': 0.5}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.6000004]
 [2.6000004]]
[{'h:0': 0.6}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.6000004], dtype=float32)}]
[[2.4000003]
 [2.4000003]]
[{'h:0': 0.70000005}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.4000003]
 [2.4000003]]
[{'h:0': 0.6}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.4000003]
 [2.4000003]]
[{'h:0': 0.5}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.6000004]
 [2.6000004]]
step: 995, loss: 0.6999993324279785
[{'h:0': 0.6}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.6000004], dtype=float32)}]
[[2.4000003]
 [2.4000003]]
[{'h:0': 0.70000005}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.4000003]
 [2.4000003]]
[{'h:0': 0.6}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.4000003]
 [2.4000003]]
[{'h:0': 0.5}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.4000003], dtype=float32)}]
[[2.6000004]
 [2.6000004]]
[{'h:0': 0.6}, {'layer_1/kernel:0': array([[0.04334712]], dtype=float32)}, {'layer_1/bias:0': array([-0.2167356], dtype=float32)}, {'output/kernel:0': array([[-1.0096708e-09]], dtype=float32)}, {'output/bias:0': array([2.6000004], dtype=float32)}]

Notice the model predicts 2.4-2.6 for the inputs and for h, the estimate is between 0.5-0.7, which is close to the actual residual errors (0.4-0.6). The behavior may change with real data - specifically, with real data there may not be duplicate inputs with different outputs, which is confusing for a model. To sanity check, we can run again with the same outputs, but change the input to 7:

step: 995, loss: 1.9000002145767212
[{'h:0': 1.8000002}, {'layer_1/kernel:0': array([[0.60248166]], dtype=float32)}, {'layer_1/bias:0': array([0.21199825], dtype=float32)}, {'output/kernel:0': array([[1.0599916]], dtype=float32)}, {'output/bias:0': array([0.2], dtype=float32)}]
[[-0.767429 ]
 [-1.0744007]]
[{'h:0': 1.9000002}, {'layer_1/kernel:0': array([[-0.88150656]], dtype=float32)}, {'layer_1/bias:0': array([-6.8724134e-08], dtype=float32)}, {'output/kernel:0': array([[0.1741176]], dtype=float32)}, {'output/bias:0': array([0.], dtype=float32)}]
[[3.543093]
 [4.895095]]
[{'h:0': 2.0000002}, {'layer_1/kernel:0': array([[-0.6377419]], dtype=float32)}, {'layer_1/bias:0': array([0.03482345], dtype=float32)}, {'output/kernel:0': array([[-1.0599916]], dtype=float32)}, {'output/bias:0': array([0.2], dtype=float32)}]
[[3.543093]
 [4.895095]]
[{'h:0': 1.9000002}, {'layer_1/kernel:0': array([[-0.6377419]], dtype=float32)}, {'layer_1/bias:0': array([0.03482345], dtype=float32)}, {'output/kernel:0': array([[-1.0599916]], dtype=float32)}, {'output/bias:0': array([0.2], dtype=float32)}]
[[3.543093]
 [4.895095]]
[{'h:0': 1.8000002}, {'layer_1/kernel:0': array([[-0.6377419]], dtype=float32)}, {'layer_1/bias:0': array([0.03482345], dtype=float32)}, {'output/kernel:0': array([[-1.0599916]], dtype=float32)}, {'output/bias:0': array([0.2], dtype=float32)}]

It's fairly accurate, as the residual error is about 2.1 (7 - 4.89) and h is output as 1.8.

It's worth noting some additional pieces may be required for this loss function - for example, bounding output since it's linear and can go to infinity (which the model may do to minimize the loss - tf.reduce_max(tf.abs(tf_y - output)) means that output being infinity results in a negative infinity loss) - but this should be a starting point.

Upvotes: 1

Yaoshiang

Reputation: 1941

You are looking for the L-inf norm of the delta between y_true and y_pred for each datapoint. The L-inf norm only calculates loss from the maximum diverging data point. If you could optimize for that, you'd find the minimum h.

Of course, L-inf is not differentiable since it's just a mathematical way to express "max". So you can approximate it with an Ln norm where n is large. You can grid search for an n that stays numerically stable and try other tricks like gradient clipping.

Also, I suspect that if you approximate L-inf using a schedule of losses, first L2, you can gradually increase the n to L3, L4, L5, etc., to help the training process.

Upvotes: 0

ASH

Reputation: 20352

Not sure if this helps, but there is a scipy.optimize package, which provides several commonly used optimization algorithms. Here is a link to the documentation.

https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html

I've been using this a lot recently, and the results are fantastic!

Upvotes: 0

How to define loss function in Tensorflow [Optimization problem]?

Answers (3)

Related Questions