Reputation: 332
As a beginner of RNN, I'm currently building a 3-to-1 autocompletion RNN model for 4-letter words, where the input is a 3-letter incomplete word and the output is a single-letter which completes the word. For example, I would desire to have the following model-prediction:
To get the desired result from an RNN model, I have made an (imbalanced) dataset as follows:
import string
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
alphList = list(string.ascii_uppercase) # Define a list of alphabets
alphToNum = {n: i for i, n in enumerate(alphList)} # dic of alphabet-numbers
# Make dataset
# define words of interest
fourList = ['CARE', 'CODE', 'COME', 'CANE', 'COPE', 'FISH', 'JAZZ', 'GAME', 'WALK', 'QUIZ']
# (len(Sequence), len(Batch), len(Observation)) following tensorflow-style
first3Data = np.zeros((3, len(fourList), len(alphList)), dtype=np.int32)
last1Data = np.zeros((len(fourList), len(alphList)), dtype=np.int32)
for idxObs, word in enumerate(fourList):
# Make an array of one-hot vectors consisting of first 3 letters
first3 = [alphToNum[n] for n in word[:-1]]
first3Data[:,idxObs,:] = np.eye(len(alphList))[first3]
# Make an array of one-hot vectors consisting of last 1 letter
last1 = alphToNum[word[3]]
last1Data[idxObs,:] = np.eye(len(alphList))[last1]
So fourList
contains the training data information, first3Data
contains all the one-hot encoded first 3 letters of the training data, and last1Data
contains all the one-hot encoded last 1 letter of the training data.
Following the standard setting of 3-to-1 RNN model,I have made the following code.
# Hyperparameters
n_data = len(fourList)
n_input = len(alphList) # number of input units
n_hidden = 128 # number of hidden units
n_output = len(alphList) # number of output units
learning_rate = 0.01
total_epoch = 100000
# Variables (separate version)
W_in = tf.Variable(tf.random_normal([n_input, n_hidden]))
W_rec = tf.Variable(tf.random_normal([n_hidden, n_hidden]))
b_rec = tf.Variable(tf.random_normal([n_hidden]))
W_out = tf.Variable(tf.random_normal([n_hidden, n_output]))
b_out = tf.Variable(tf.random_normal([n_output]))
# Manual calculation of RNN output
def RNNoutput(Xinput):
h_state = tf.random_normal([1,n_hidden]) # initial hidden state
for iX in Xinput:
h_state = tf.nn.tanh(iX @ W_in + (h_state @ W_rec + b_rec))
rnn_output = h_state @ W_out + b_out
return(rnn_output)
Note that the Manual calculation of RNN output
part basically rolls the hidden state exactly 4 times using the matrix multiplication and the tanh
activation function as follows:
tf.nn.tanh(iX @ W_in + (h_state @ W_rec + b_rec))
Here, every time the whole data is passed, one epoch is completed. Thus I initialize the h_state every time I pass the data. Additionally, note that I have not used a placeholder, which may be a cause of the learning instability.
I have used the following code to train the network.
# Cost / optimizer definition
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=RNNoutput(first3Data),
labels=last1Data))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
# Train and keep track of the loss history
sess = tf.Session()
sess.run(tf.global_variables_initializer())
lossHistory = []
for epoch in range(total_epoch):
_, loss = sess.run([optimizer, cost])
lossHistory.append(loss)
The resulting learning curve looks as follows. Indeed, it shows an exponential decay.
However, for me it looks too wiggly for this kind of simple example, showing some instabilities even in the late period of the learning.
plt.plot(range(total_epoch), lossHistory)
plt.show()
I think the learning curve should show a square-like stable decay pattern as expected using tensorflow
built-in functions (*). But I think this instability may be explained plausibly as follows:
RNNoutput
tensor for loop
but using the for loop directly in dataBut I don't think any of these played a crucial role. Is there any other solution to help me out?
(*) I have seen a nearly square-patterned loss decay using tensorflow
built-in functions for simple RNN. But sorry that I have not included the results to be compared, since I run out of time... I think I can edit shortly.
Upvotes: 0
Views: 187
Reputation: 207
This modification where the initial state is set to be zero seems to solve the problem.
# Variables (separate version)
W_in = tf.Variable(tf.random_normal([n_input, n_hidden]))
W_rec = tf.Variable(tf.random_normal([n_hidden, n_hidden]))
b_rec = tf.Variable(tf.random_normal([n_hidden]))
W_out = tf.Variable(tf.random_normal([n_hidden, n_output]))
b_out = tf.Variable(tf.random_normal([n_output]))
h_init = tf.zeros([1,n_hidden])
# Manual calculation of RNN output
def RNNoutput(Xinput):
h_state = h_init # initial hidden state
for iX in Xinput:
h_state = tf.nn.tanh(iX @ W_in + (h_state @ W_rec + b_rec))
rnn_output = h_state @ W_out + b_out
return(rnn_output)
Upvotes: 1