Reputation: 734
I know that this is a very broad question, but I have asked many other questions and I have still been unable to properly implement a simple dynamic-k max pooling convolutional neural network as described in this paper. Currently, I am trying to modify the code from this tutorial. I believe I have successfully implemented the dynamic-k part. However, my main problem is because the k value is different for each input, the tensors that are produced are different shapes. I have tried countless things to try and fix this (which is why you may see some funny reshaping), but I can't figure out how. I think that you'd need to pad each tensor to get them all to be the size of the biggest one, but I can't seem to get that to work. Here is my code (I am sorry, it is generally rather sloppy).
# train.py
import datetime
import time
import numpy as np
import os
import tensorflow as tf
from env.src.sentiment_analysis.dcnn.text_dcnn import TextDCNN
from env.src.sentiment_analysis.cnn import data_helpers as data_helpers
from tensorflow.contrib import learn
# Model Hyperparameters
tf.flags.DEFINE_integer("embedding_dim", 128, "Dimensionality of character embedding (default: 128)")
tf.flags.DEFINE_string("filter_sizes", "3,4,5", "Comma-separated filter sizes (default: '3,4,5')")
tf.flags.DEFINE_integer("num_filters", 128, "Number of filters per filter size (default: 128)")
tf.flags.DEFINE_float("dropout_keep_prob", 0.5, "Dropout keep probability (default: 0.5)")
tf.flags.DEFINE_float("l2_reg_lambda", 0.0, "L2 regularizaion lambda (default: 0.0)")
# Training parameters
tf.flags.DEFINE_integer("batch_size", 256, "Batch Size (default: 64)")
tf.flags.DEFINE_integer("num_epochs", 200, "Number of training epochs (default: 200)")
tf.flags.DEFINE_integer("evaluate_every", 100, "Evaluate model on dev set after this many steps (default: 100)")
tf.flags.DEFINE_integer("checkpoint_every", 100, "Save model after this many steps (default: 100)")
# Misc Parameters
tf.flags.DEFINE_boolean("allow_soft_placement", True, "Allow device soft device placement")
tf.flags.DEFINE_boolean("log_device_placement", False, "Log placement of ops on devices")
tf.flags.DEFINE_string("positive_file", "../rotten_tomatoes/rt-polarity.pos", "Location of the rt-polarity.pos file")
tf.flags.DEFINE_string("negative_file", "../rotten_tomatoes/rt-polarity.neg", "Location of the rt-polarity.neg file")
FLAGS = tf.flags.FLAGS
FLAGS._parse_flags()
print("\nParameters:")
for attr, value in sorted(FLAGS.__flags.items()):
print("{} = {}".format(attr.upper(), value))
print("")
# Data Preparatopn
# Load data
print("Loading data...")
x_text, y = data_helpers.load_data_and_labels(FLAGS.positive_file, FLAGS.negative_file)
# Build vocabulary
max_document_length = max([len(x.split(" ")) for x in x_text])
vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)
x = np.array(list(vocab_processor.fit_transform(x_text)))
x_arr = np.array(x_text)
seq_lens = []
for s in x_arr:
seq_lens.append(len(s.split(" ")))
# Randomly shuffle data
np.random.seed(10)
shuffle_indices = np.random.permutation(np.arange(len(y)))
x_shuffled = x[shuffle_indices]
y_shuffled = y[shuffle_indices]
# Split train/test set
x_train, x_dev = x_shuffled[:-1000], x_shuffled[-1000:]
y_train, y_dev = y_shuffled[:-1000], y_shuffled[-1000:]
print("Vocabulary Size: {:d}".format(len(vocab_processor.vocabulary_)))
print("Train/Dev split: {:d}/{:d}".format(len(y_train), len(y_dev)))
# Training
with tf.Graph().as_default():
session_conf = tf.ConfigProto(
allow_soft_placement=FLAGS.allow_soft_placement,
log_device_placement=FLAGS.log_device_placement
)
sess = tf.Session(config=session_conf)
with sess.as_default():
print("HERE")
print(x_train.shape)
dcnn = TextDCNN(
sequence_lengths=seq_lens,
sequence_length=x_train.shape[1],
num_classes=y_train.shape[1],
vocab_size=len(vocab_processor.vocabulary_),
embedding_size=FLAGS.embedding_dim,
filter_sizes=list(map(int, FLAGS.filter_sizes.split(","))),
num_filters=FLAGS.num_filters,
)
# The training procedure
global_step = tf.Variable(0, name="global_step", trainable=False)
optimizer = tf.train.AdamOptimizer(1e-4)
grads_and_vars = optimizer.compute_gradients(dcnn.loss)
train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)
# Output directory for models and summaries
timestamp = str(int(time.time()))
out_dir = os.path.abspath(os.path.join(os.path.curdir, "runs", timestamp))
print("Writing to {}\n".format(out_dir))
# Summaries for loss and accuracy
loss_summary = tf.scalar_summary("loss", dcnn.loss)
acc_summary = tf.scalar_summary("accuracy", dcnn.accuracy)
# Summaries for training
train_summary_op = tf.merge_summary([loss_summary, acc_summary])
train_summary_dir = os.path.join(out_dir, "summaries", "train")
train_summary_writer = tf.train.SummaryWriter(train_summary_dir, sess.graph)
# Summaries for devs
dev_summary_op = tf.merge_summary([loss_summary, acc_summary])
dev_summary_dir = os.path.join(out_dir, "summaries", "dev")
dev_summary_writer = tf.train.SummaryWriter(dev_summary_dir, sess.graph)
# Checkpointing
checkpoint_dir = os.path.abspath(os.path.join(out_dir, "checkpoints"))
checkpoint_prefix = os.path.join(checkpoint_dir, "model")
# TensorFlow assumes this directory already exsists so we need to create it
if not os.path.exists(checkpoint_dir):
os.makedirs(checkpoint_dir)
saver = tf.train.Saver(tf.all_variables())
# Write vocabulary
vocab_processor.save(os.path.join(out_dir, "vocab"))
# Initialize all variables
sess.run(tf.initialize_all_variables())
def train_step(x_batch, y_batch):
"""
A single training step.
Args:
x_batch: A batch of X training values.
y_batch: A batch of Y training values
Returns: void
"""
feed_dict = {
dcnn.input_x: x_batch,
dcnn.input_y: y_batch,
dcnn.dropout_keep_prob: FLAGS.dropout_keep_prob
}
# Execute train_op
_, step, summaries, loss, accuracy = sess.run(
[train_op, global_step, train_summary_op, dcnn.loss, dcnn.accuracy],
feed_dict
)
# Print and save to disk loss and accuracy of the current training batch
time_str = datetime.datetime.now().isoformat()
print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))
train_summary_writer.add_summary(summaries, step)
def dev_step(x_batch, y_batch, writer=None):
"""
Evaluates a model on a dev set.
Args:
x_batch: A batch of X training values.
y_batch: A batch of Y training values.
writer: The writer to use to record the loss and accuracy
Returns: void
"""
feed_dict = {
dcnn.input_x: x_batch,
dcnn.input_y: y_batch,
dcnn.dropout_keep_prob : 1.0
}
step, summaries, loss, accuracy = sess.run(
[global_step, dev_summary_op, dcnn.loss, dcnn.accuracy],
feed_dict
)
time_str = datetime.datetime.now().isoformat()
print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))
if writer:
writer.add_summary(summaries, step)
# Generate batches
batches = data_helpers.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)
# Training loop. For each batch...
for batch in batches:
x_batch, y_batch = zip(*batch)
train_step(x_batch, y_batch)
current_step = tf.train.global_step(sess, global_step)
if current_step % FLAGS.evaluate_every == 0:
print("\nEvaluation:")
dev_step(x_dev, y_dev, writer=dev_summary_writer)
print("")
if current_step % FLAGS.checkpoint_every == 0:
path = saver.save(sess, checkpoint_prefix, global_step=current_step)
print("Saved model checkpoint to {}\n".format(path))
And here is the actual DCNN class:
import tensorflow as tf
class TextDCNN(object):
"""
A CNN for NLP tasks. Architecture is as follows:
Embedding layer, conv layer, max-pooling and softmax layer
"""
def __init__(self, sequence_lengths, sequence_length, num_classes, vocab_size, embedding_size, filter_sizes, num_filters):
"""
Makes a new CNNClassifier
Args:
sequence_length: The length of each sentence
num_classes: Number of classes in the output layer (positive and negative would be 2 classes)
vocab_size: The size of the vocabulary, needed to define the size of the embedding layer
embedding_size: Dimensionality of the embeddings
filter_sizes: Number of words the convolutional filters will cover, there will be num_filters for each size
specified.
num_filters: The number of filters per filter size.
Returns: A new CNNClassifier with the given parameters.
"""
# Define the inputs and the dropout
print("SEQL")
print(sequence_length)
self.input_x = tf.placeholder(tf.int32, [None, sequence_length], name="input_x")
self.input_y = tf.placeholder(tf.float32, [None, num_classes], name="input_y")
self.dropout_keep_prob = tf.placeholder(tf.float32, name="dropout_keep_prob")
# Runs the operations on the CPU and organizes them into an embedding scope
with tf.device("/cpu:0"), tf.name_scope("embedding"):
W = tf.Variable( # Make a 4D tensor to store batch, width, height, and channel
tf.random_uniform([vocab_size, embedding_size], -1.0, 1.0),
name="W"
)
self.embedded_chars = tf.nn.embedding_lookup(W, self.input_x)
self.embedded_chars_expanded = tf.expand_dims(self.embedded_chars, -1)
pooled_outputs = []
for i, filter_size in enumerate(filter_sizes):
with tf.name_scope("conv-maxpool-%s" % filter_size):
# Conv layer
filter_shape = [filter_size, embedding_size, 1, num_filters]
# W is the filter matrix
W = tf.Variable(tf.truncated_normal(filter_shape, stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_filters]), name="b")
conv = tf.nn.conv2d(
self.embedded_chars_expanded,
W,
strides=[1, 1, 1, 1],
padding="VALID",
name="conv"
)
# Apply nonlinearity
h = tf.nn.relu(tf.nn.bias_add(conv, b), name="relu")
# Max-pooling layer over the outputs
print(sequence_lengths[i] - filter_size + 1)
print(h)
pooled = tf.nn.max_pool(
h,
ksize=[1, sequence_lengths[i] - filter_size + 1, 1, 1],
strides=[1, 1, 1, 1],
padding="VALID",
name="pool"
)
pooled = tf.reshape(pooled, [-1, 1, 1, num_filters])
print(pooled)
pooled_outputs.append(pooled)
# Combine all of the pooled features
num_filters_total = num_filters * len(filter_sizes)
max_shape = tf.reduce_max(pooled_outputs, 1)
print("shapes")
print([p.get_shape() for p in pooled_outputs])
# pooled_outputs = [tf.pad(p, [[0, int(max_shape.get_shape()[0]) - int(p.get_shape()[0])], [0, 0], [0, 0], [0, 0]]) for p in pooled_outputs]
# pooled_outputs = [tf.reshape(p, [-1, 1, 1, num_filters]) for p in pooled_outputs]
# pooled_outputs = [tf.reshape(out, [-1, 1, 1, self.max_length]) for out in pooled_outputs]
self.h_pool = tf.concat(3, pooled_outputs)
self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])
print("here")
print(self.h_pool_flat)
self.h_pool_flat = tf.reshape(self.h_pool, [max(sequence_lengths), num_filters_total])
# Add dropout
with tf.name_scope("dropout"):
# casted = tf.cast(self.dropout_keep_prob, tf.int32)
self.h_drop = tf.nn.dropout(self.h_pool_flat, self.dropout_keep_prob)
self.h_drop = tf.reshape(self.h_drop, [-1, num_filters_total])
# Do raw predictions (no softmax)
with tf.name_scope("output"):
W = tf.Variable(tf.truncated_normal([num_filters_total, num_classes], stddev=0.1), name="W")
b = tf.Variable(tf.constant(0.1, shape=[num_classes]), name="b")
# xw_plus_b(...) is just Wx + b matmul alias
self.scores = tf.nn.xw_plus_b(self.h_drop, W, b, name="scores")
self.predictions = tf.argmax(self.scores, 1, name="predictions")
# Calculate mean cross-entropy loss
with tf.name_scope("loss"):
# softmax_cross_entropy_with_logits(...) calculates cross-entropy loss
losses = tf.nn.softmax_cross_entropy_with_logits(self.scores, self.input_y)
'''print("here")
print(losses.get_shape())
print(self.scores.get_shape())
print(self.input_y.get_shape())'''
self.loss = tf.reduce_mean(losses)
# Calculate accuracy
with tf.name_scope("accuracy"):
correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1))
self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")
I am using the Rotten Tomatoes sentiment labeled data set. The current error I am getting is this:
InvalidArgumentError (see above for traceback): input[1,0] mismatch: 5888 vs. 4864
[[Node: gradients/concat_grad/ConcatOffset = ConcatOffset[N=3, _device="/job:localhost/replica:0/task:0/cpu:0"](concat/concat_dim, gradients/concat_grad/ShapeN, gradients/concat_grad/ShapeN:1, gradients/concat_grad/ShapeN:2)]]
How can I fix this code so that all of the tensors are normalized to the same size after pooling (while keeping pooling dynamic) and so that the code runs to completion?
Sorry about all of the random commented out lines and prints and stuff, but I have tried extensively to make this work.
Upvotes: 2
Views: 3490
Reputation: 93
Although tensorflow doesn't provide k-max pooling directly, I think tf.nn.top_k might help you build that op.
Upvotes: 1
Reputation: 574
There are three things to note here.
max-pooling
and k-max pooling
are two different operations. max-pooling retrieves the maximum valued activation out of the pooling window while k-max pooling retrieves k maximum values from the pooling window.
Tensorflow doesn't provide API for k-max pooling as of now. The one which you are trying now is max-pooling operation and not k-max pooling operation.
As per my knowledge, tensorflow does not provide functionality to handle pooling resulting in different size of matrices. So, you may use bucketing to create batches of sentences of similar length and the use k-max pooling.
Upvotes: 0