Tensorflow RNN text generation example tutorial

Question

Looking at this tutorial here, they use a starting sequence of “Romeo: “.

int(generate_text(model, start_string=u"ROMEO: "))

However, looking at the actual generation step, is it fair to say it’s only using the last character “ “? So it’s the same whether we use “ROMEO: “ or just “ “? It’s hard to test as it samples from the output distribution ...

Relatedly, it’s unclear how it would predict from such a short string as the original training sequence is much longer. I understand if we trained on a history of 100 chars we predict the 101st and then use 2-101 to predict 102... but how does it start with just 7 characters?

EDIT

As a specific example, I reworked my model to be of the following form:

    model = tf.keras.Sequential()
model.add(tf.keras.layers.SimpleRNN(units=512, input_shape = (seq_len, 1), activation="tanh"))
    model.add(tf.keras.layers.Dense(len(vocab))) 
    model.compile(loss=loss, optimizer='adam')
    model.summary()

Notice, I use a simpleRNN instead of a GRU and drop the embedding step. Both of those changes are to simplify the model but that shouldn't matter.

My training and output data is as follows:

>>> input_array_reshaped
array([[46., 47., 53., ..., 39., 58.,  1.],
       [ 8.,  0., 20., ..., 33., 31., 10.],
       [63.,  1., 44., ..., 58., 46., 43.],
       ...,
       [47., 41., 47., ...,  0., 21., 57.],
       [59., 58.,  1., ...,  1., 61., 43.],
       [52., 57., 43., ...,  1., 63., 53.]])
>>> input_array_reshaped.shape
(5000, 100)

>>> output_array_reshaped.shape
(5000, 1, 1)

>>> output_array_reshaped
array([[[40.]],

       [[ 0.]],

       [[56.]],

       ...,

       [[ 1.]],

       [[56.]],

       [[59.]]])

However, if I try to predict on a string less than 100 characters I get:

ValueError: Error when checking input: expected simple_rnn_1_input to have shape (100, 1) but got array with shape (50, 1)

Below is my prediction function if needed. If I change the required_training_length to anything but 100 it crashes. It requires "specifically" time_steps of length 100.

Can someone tell me how to adjust the model to make it more flexible as in the example? What subtlety am I missing?

def generateText(starting_corpus, num_char_to_generate = 1000, required_training_length = 100):
    random_starting_int = random.sample(range(len(text)),1)[0]
    ending_position = random_starting_int+required_training_length

    starting_string = text[random_starting_int:ending_position]
    print("Starting string is: " + starting_string)
    numeric_starting_string = [char2idx[x] for x in starting_string]
    reshaped_numeric_string = np.reshape(numeric_starting_string, (1, len(numeric_starting_string), 1)).astype('float32')


    output_numeric_vector = []
    for i in range(num_char_to_generate): 
        if i%50 == 0:
            print("Processing character index: "+str(i))
        predicted_values = model.predict(reshaped_numeric_string)
        selected_predicted_value = tf.random.categorical(predicted_values, num_samples = 1)[0][0].numpy().astype('float32') #sample from the predicted values
        #temp = reshaped_numeric_string.copy()
        output_numeric_vector.append(selected_predicted_value)
        reshaped_numeric_string = np.append(reshaped_numeric_string[:,1:,:], np.reshape(selected_predicted_value, (1,1,1)), axis = 1)

    predicted_chars = [idx2char[x] for x in output_numeric_vector]
    final_text = ''.join(predicted_chars)
    return(final_text)

Gaslight Deceive Subvert · Accepted Answer

However, looking at the actual generation step, is it fair to say it’s only using the last character “ “? So it’s the same whether we use “ROMEO: “ or just “ “? It’s hard to test as it samples from the output distribution ...

No, it is taking all characters into consideration. You can easily verify that by using a fixed random seed:

from numpy.random import seed
from tensorflow.random import set_seed
seed(1)
set_seed(1)
print('======')
print(generate_text(m, 'ROMEO: '))
seed(1)
set_seed(1)
print('======')
print(generate_text(m, ' '))

Relatedly, it’s unclear how it would predict from such a short string as the original training sequence is much longer. I understand if we trained on a history of 100 chars we predict the 101st and then use 2-101 to predict 102... but how does it start with just 7 characters?

Internally it runs the sequence in a loop. It takes the first character and predicts the second. Then the second to predict the third and so on. While doing so it updates its hidden state so that its predictions becomes better and better. Eventually it plateaus because it cannot remember arbitrary long sequences.

Tensorflow RNN text generation example tutorial

Answers (1)

Related Questions