Paul
Paul

Reputation: 906

Tensorflow, square root activation function implementation (shaping error)

For the purpose of implementing a classification NN I found some really useful tutorials, like this one (2 hidden layer, one-hot-encoding output, dropout regularization, normalization etc.) which helped me with a bit of the learning curve behind Tensorflow API. However, by reading the publication on SQRT activation functions, and seeing the optimistic feedback, I would like to experiment with it in my NN architecture.

After not founding it in the Tensorflow API, I looked at how to define custom activation functions, and found this stack-overflow solution, and figured that it 'should be possible' to implement with Tensorflow primitives.

So if the SQRT activation function needs to be this (please excuse pasting, looks better than typing out myself):

Square root activation function

I inserted this code instead of the hidden layer ReLU function:

# ==== old activation function
# b = bias value x bias weight
# inputs = x data 
# w = weights
y = tf.nn.relu( tf.add( tf.matmul(w, tf.transpose(inputs)), b))

# ===== new act function
net = tf.cast( tf.add( tf.matmul(w, tf.transpose(inputs)), b), tf.float32)  # net input to activation function
cond = tf.greater_equal(net, tf.constant(0.0, dtype=tf.float32))            # >= condition
condTrue = tf.sqrt(net)                                   # if True
minOne = tf.constant(-1.0, shape=(N,1) dtype=tf.float32)  # -1 constant value
condFalse = tf.matmul(minOne, tf.sqrt( tf.abs(net)))      # if False
y = tf.cond(cond, lambda: condTrue, lambda: condFalse)    # act. function output 

But if I try to run this code, I get a shaping error:

ValueError("Dimensions must be equal, but are 1 and 107 for 'MatMul_2' (op: 'MatMul') with input shapes: [107,1], [107,?].",)

Could someone please have a look at the code snippet and tell me if my approach is correct? Apart from the error stating a rank problem between the inputs, I suspect my bigger problem is still understanding and wrapping my head around the matrix-based tensorflow operators.

Between all the multiplications, additions and transposes, I lose track of what the required underlying data shapes of the tensors must be. Will my code correctly define the intended activation function (and what about the back-prop derivative?), and if not, please describe where- and how I went wrong?

Any help would be appreciated please, I would like to understand the problem better (as I'm still learning the API)

Upvotes: 1

Views: 1109

Answers (1)

Vijay Mariappan
Vijay Mariappan

Reputation: 17201

You can use a simpler logic for the activation function implementation:

x = tf.constant([ -4, 4, -2, 2, 0], tf.float32)
act = tf.sign(x)* tf.sqrt(tf.abs(x))

with tf.Session() as sess:
   print(sess.run(act))

#[-2.  2. -1.4142135 1.4142135 0. ]

Upvotes: 1

Related Questions