Isilmë O.
Isilmë O.

Reputation: 1746

Stacked RNN model setup in TensorFlow

I'm kind of lost in building up a stacked LSTM model for text classification in TensorFlow.

My input data was something like:

x_train = [[1.,1.,1.],[2.,2.,2.],[3.,3.,3.],...,[0.,0.,0.],[0.,0.,0.],
           ...... #I trained the network in batch with batch size set to 32.
          ]
y_train = [[1.,0.],[1.,0.],[0.,1.],...,[1.,0.],[0.,1.]]
# binary classification

The skeleton of my code looks like:

self._input = tf.placeholder(tf.float32, [self.batch_size, self.max_seq_length, self.vocab_dim], name='input')
self._target = tf.placeholder(tf.float32, [self.batch_size, 2], name='target')

lstm_cell = rnn_cell.BasicLSTMCell(self.vocab_dim, forget_bias=1.)
lstm_cell = rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=self.dropout_ratio)
self.cells = rnn_cell.MultiRNNCell([lstm_cell] * self.num_layers)
self._initial_state = self.cells.zero_state(self.batch_size, tf.float32)

inputs = tf.nn.dropout(self._input, self.dropout_ratio)
inputs = [tf.reshape(input_, (self.batch_size, self.vocab_dim)) for input_ in
              tf.split(1, self.max_seq_length, inputs)]

outputs, states = rnn.rnn(self.cells, inputs, initial_state=self._initial_state)

# We only care about the output of the last RNN cell...
y_pred = tf.nn.xw_plus_b(outputs[-1], tf.get_variable("softmax_w", [self.vocab_dim, 2]), tf.get_variable("softmax_b", [2]))

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_pred, self._target))
correct_pred = tf.equal(tf.argmax(y_pred, 1), tf.argmax(self._target, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

train_op = tf.train.AdamOptimizer(self.lr).minimize(loss)

init = tf.initialize_all_variables()

with tf.Session() as sess:
        initializer = tf.random_uniform_initializer(-0.04, 0.04)
        with tf.variable_scope("model", reuse=True, initializer=initializer):
            sess.run(init)
            # generate batches here (omitted for clarity)
            print sess.run([train_op, loss, accuracy], feed_dict={self._input: batch_x, self._target: batch_y})

The problem is that no matter how large the dataset is, the loss and accuracy has no sign of improvement (looks completely stochastic). Am I doing anything wrong?

Update:

# First, load Word2Vec model in Gensim.
model = Doc2Vec.load(word2vec_path)

# Second, build the dictionary.
gensim_dict = Dictionary()
gensim_dict.doc2bow(model.vocab.keys(), allow_update=True)
w2indx = {v: k + 1 for k, v in gensim_dict.items()}
w2vec = {word: model[word] for word in w2indx.keys()}

# Third, read data from a text file.
for fname in fnames:
        i = 0
        with codecs.open(fname, 'r', encoding='utf8') as fr:
            for line in fr:
                tmp = []
                for t in line.split():

                    tmp.append(t)

                X_train.append(tmp)
                i += 1
                if i is samples_count:
                    break

# Fourth, convert words into vectors, and pad each sentence with ZERO arrays to a fixed length.
result = np.zeros((len(data), self.max_seq_length, self.vocab_dim), dtype=np.float32)
    for rowNo in xrange(len(data)):
        rowLen = len(data[rowNo])
        for colNo in xrange(rowLen):
            word = data[rowNo][colNo]
            if word in w2vec:
                result[rowNo][colNo] = w2vec[word]
            else:
                result[rowNo][colNo] = [0] * self.vocab_dim
        for colPadding in xrange(rowLen, self.max_seq_length):
            result[rowNo][colPadding] = [0] * self.vocab_dim
    return result

# Fifth, generate batches and feed them to the model.
... Trivias ...

Upvotes: 2

Views: 1997

Answers (1)

ilblackdragon
ilblackdragon

Reputation: 1835

Here are few reasons it may not be training and suggestions to try:

  • You are not allowing to update word vectors, space of pre-learned vectors may be not working properly.

  • RNNs really need gradient clipping when trained. You can try adding something like this.

  • Unit scale initialization seems to work better, as it accounts for the size of the layer and allows gradient to be scaled properly as it goes deeper.

  • You should try removing dropout and second layer - just to check if your data passing is correct and your loss is going down at all.

I also can recommend trying this example with your data: https://github.com/tensorflow/skflow/blob/master/examples/text_classification.py

It trains word vectors from scratch, already has gradient clipping and uses GRUCells which usually are easier to train. You can also see nice visualizations for loss and other things by running tensorboard logdir=/tmp/tf_examples/word_rnn.

Upvotes: 1

Related Questions