pnpsuM
pnpsuM

Reputation: 11

I get a loss : nan when implementing mish from the scratch

I'm currently working on making a custom activation function using tf2 on python.

model architecture: VGG 16, on CIFAR-10
epochs: 100
lr: 0.001 for initial 80 epochs, 0.0001 for 20 epochs
optimizer: Adam
loss: categorical cross entropy
batch_size: 128
num_cnn_list = [2,2,4,4,4]
channel_list = [16,128,256,512,512]

There's no problem when I use mish activation function like this:

@tf.function
def mish(x):
  return x * K.tanh(K.softplus(x))  # K.log(1 + K.exp(x))

However, when I tried implementing mish like following:

@tf.function
def mish_inside(x):
  return K.log(1 + K.exp(x))
@tf.function
def mish(x):
  return x * K.tanh(mish_inside(x))

Initially I get a normal loss value, but soon I get nan.(before the first epoch is done)

Which is quite crucial, because this can refer that same function(softplus and ln(1+e^x)) can have different backpropagation value according to its implementation method, so that a new activation function should be simple or consisting of functions in the library.

This works quite well with smaller lr(0.0001), but I want it to be robust on misc. settings.

In the first place, it doesn't make a sense that same function returns different loss value.

Does implementation method really matters? or am I missing somethong else?

This is what I found on tensorflow directory:

def softplus(features, name=None):
  r"""TODO: add doc.

  Args:
    features: A `Tensor`. Must be one of the following types: `half`, `bfloat16`, `float32`, `float64`.
    name: A name for the operation (optional).

  Returns:
    A `Tensor`. Has the same type as `features`.
  """
  _ctx = _context._context or _context.context()
  tld = _ctx._thread_local_data
  if tld.is_eager:
    try:
      _result = pywrap_tfe.TFE_Py_FastPathExecute(
        _ctx, "Softplus", name, features)
      return _result
    except _core._NotOkStatusException as e:
      _ops.raise_from_not_ok_status(e, name)
    except _core._FallbackException:
      pass
    try:
      return softplus_eager_fallback(
          features, name=name, ctx=_ctx)
    except _core._SymbolicException:
      pass  # Add nodes to the TensorFlow graph.
  # Add nodes to the TensorFlow graph.
  _, _, _op, _outputs = _op_def_library._apply_op_helper(
        "Softplus", features=features, name=name)
  _result = _outputs[:]
  if _execute.must_record_gradient():
    _attrs = ("T", _op._get_attr_type("T"))
    _inputs_flat = _op.inputs
    _execute.record_gradient(
        "Softplus", _inputs_flat, _attrs, _result)
  _result, = _result
  return _result

Softplus = tf_export("raw_ops.Softplus")(_ops.to_raw_op(softplus))

Still, I don't have a clue cuz I can't find pywrap_tfe.py.

Upvotes: 1

Views: 177

Answers (0)

Related Questions