Reputation: 97
I want to train a neural network that generates text character by character. After some research I have decided to use a LSTM network for this task.
My input is structured as follows: I have a file full of text (roughly 90,000,000 characters), which I slice up into overlapping sequences of 50 characters. Consider this example sentence:
The quick brown fox jumps over the lazy dog
I split the text up into sequences:
The quick brown
he quick brown_
e quick brown f
_quick brown fo
quick brown fox
I added underscores where the spaces wouldn't be displayed in these...
These shall be the time steps for my input data. Output would be the character that comes next after each sequence, so _, f, o, x and _
for the sequences above.
Characters are one-hot encoded in a vector with the length of all characters in the dictionary, so with an alphabet consisting of A B C D
, the character C would be represented as [0 0 1 0]
Because I can't fit all text vectorized in memory at once, im breaking it up into batches that contain only a small amount of generated sequences for the network to train on.
With that I get my input placeholder:
x = tf.placeholder(tf.float32, [batch_size, time_steps, char_size]
In the example code below I'm using a batch_size
of 128, time_steps
of 50 and char_size
of 48 to represent a standard alphabet with 50 letters upper- and lowercase.
num_units
passed to BasicLSTMCell
was also arbitrarily chosen to be 256 (following some tutorials with my hyperparameters here)
Here is the code:
import tensorflow as tf
batch_size = 128
time_steps = 50
char_size = 50
num_units = 256
sess = tf.InteractiveSession()
X = tf.placeholder(tf.float32, [batch_size, time_steps, char_size])
cell = tf.contrib.rnn.BasicLSTMCell(num_units)
cell = tf.contrib.rnn.MultiRNNCell([cell] * 2, state_is_tuple=True)
output, state = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
And this is the error message:
Traceback (most recent call last):
File ".\issue.py", line 16, in <module>
output, state = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn.py", line 598, in dynamic_rnn
dtype=dtype)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn.py", line 761, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2775, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2604, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2554, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn.py", line 746, in _time_step
(output, new_state) = call_cell()
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn.py", line 732, in <lambda>
call_cell = lambda: cell(input_t, state)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 180, in __call__
return super(RNNCell, self).__call__(inputs, state)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\layers\base.py", line 450, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 938, in call
cur_inp, new_state = cell(cur_inp, cur_state)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 180, in __call__
return super(RNNCell, self).__call__(inputs, state)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\layers\base.py", line 450, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 401, in call
concat = _linear([inputs, h], 4 * self._num_units, True)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 1039, in _linear
initializer=kernel_initializer)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 360, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1405, in wrapped_custom_getter
*args, **kwargs)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 183, in _rnn_get_variable
variable = getter(*args, **kwargs)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\rnn_cell_impl.py", line 183, in _rnn_get_variable
variable = getter(*args, **kwargs)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "C:\Users\uidq6096\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 669, in _get_single_variable
found_var.get_shape()))
ValueError: Trying to share variable rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel, but specified shape (512, 1024) and found shape (306, 1024).
I have been breaking my head over this for the past couple of days, what am I missing here?
Upvotes: 1
Views: 73
Reputation: 12918
Initialize your multiple cells in a loop, rather than using the [cell] * n
notation:
cells = []
for _ in range(n):
cells.append(tf.contrib.rnn.BasicLSTMCell(num_units)) # build list of cells
cell = tf.contrib.rnn.MultiRNNCell(cells, state_is_tuple=True) # pass your list of cells
output, state = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
Otherwise it is basically trying to use the same cell multiple times, for which the dimensions do not work out. This behavior was changed in, I believe, the 1.0 release. You used to be able to get away with your original syntax; now you have to use this.
Upvotes: 1