Reputation: 11
I'm currently working on making a custom activation function using tf2 on python.
model architecture: VGG 16, on CIFAR-10
epochs: 100
lr: 0.001 for initial 80 epochs, 0.0001 for 20 epochs
optimizer: Adam
loss: categorical cross entropy
batch_size: 128
num_cnn_list = [2,2,4,4,4]
channel_list = [16,128,256,512,512]
There's no problem when I use mish activation function like this:
@tf.function
def mish(x):
return x * K.tanh(K.softplus(x)) # K.log(1 + K.exp(x))
However, when I tried implementing mish like following:
@tf.function
def mish_inside(x):
return K.log(1 + K.exp(x))
@tf.function
def mish(x):
return x * K.tanh(mish_inside(x))
Initially I get a normal loss value, but soon I get nan.(before the first epoch is done)
Which is quite crucial, because this can refer that same function(softplus and ln(1+e^x)) can have different backpropagation value according to its implementation method, so that a new activation function should be simple or consisting of functions in the library.
This works quite well with smaller lr(0.0001), but I want it to be robust on misc. settings.
In the first place, it doesn't make a sense that same function returns different loss value.
Does implementation method really matters? or am I missing somethong else?
This is what I found on tensorflow directory:
def softplus(features, name=None):
r"""TODO: add doc.
Args:
features: A `Tensor`. Must be one of the following types: `half`, `bfloat16`, `float32`, `float64`.
name: A name for the operation (optional).
Returns:
A `Tensor`. Has the same type as `features`.
"""
_ctx = _context._context or _context.context()
tld = _ctx._thread_local_data
if tld.is_eager:
try:
_result = pywrap_tfe.TFE_Py_FastPathExecute(
_ctx, "Softplus", name, features)
return _result
except _core._NotOkStatusException as e:
_ops.raise_from_not_ok_status(e, name)
except _core._FallbackException:
pass
try:
return softplus_eager_fallback(
features, name=name, ctx=_ctx)
except _core._SymbolicException:
pass # Add nodes to the TensorFlow graph.
# Add nodes to the TensorFlow graph.
_, _, _op, _outputs = _op_def_library._apply_op_helper(
"Softplus", features=features, name=name)
_result = _outputs[:]
if _execute.must_record_gradient():
_attrs = ("T", _op._get_attr_type("T"))
_inputs_flat = _op.inputs
_execute.record_gradient(
"Softplus", _inputs_flat, _attrs, _result)
_result, = _result
return _result
Softplus = tf_export("raw_ops.Softplus")(_ops.to_raw_op(softplus))
Still, I don't have a clue cuz I can't find pywrap_tfe.py.
Upvotes: 1
Views: 177